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POCUMENT NAME] SPECIFICATION 

[TITLE OF THE INVENTION] IMAGE PROCESSING METHOD 

AND APPARATUS 

[CLAIMS] 

[Claim 1] An image processing method for evaluating matching 
between a template image and an input image by use of a similarity value 
map, the method including a step of generating an evaluation vector for 
each of the template image and the input image, wherein the evaluation 
vector includes a component in which an edge normal direction vector of a 
specified image undergoes even-numbered times angular transformation. 

[Claim2] An image processing method comprising: 

a step of inputting a specified image for both a template image 
and an input image and calculating an edge normal direction vector of the 
specified image; 

a step of generating an evaluation vector from the edge normal 
direction vector; 

a step of subjecting the evaluation vector to orthogonal 
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transformation; 

a step of performing a product sum calculation of corresponding 
spectral data for each evaluation vector that has been subjected to 
orthogonal transformation and has been obtained for the template image 
and input image; and 

a step of subjecting a result of the product sum calculation to 
inverse orthogonal transformation and generating a map of similarity 
values; 

wherein a formula of the said similarity values, the said 
orthogonal transformation, and the said inverse orthogonal transformation 
each have linearity. 

[Claim3] The image processing method of Claim 2, further comprising 
a step of compressing each evaluation vector that has been subjected to 
orthogonal transformation so as to reduce a processing amount. 

[Claim4] The image processing method of Claim 2 to 3, wherein for 
the template image, the steps taken until the evaluation vector that has 
been subjected to orthogonal transformation is compressed are executed 
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before the input image is input, and its result is stored in a recording 
means. 

[Claim5] The image processing method of Claim 2 to 4, wherein the 
evaluation vector is normalized with respect to a vector length. 

[Claim6] The image processing method of Claim 2 to 5, wherein the 
evaluation vector of the template image is normalized by the number of 
edge normal direction vectors. 

[Claim7] The image processing method of Claim 2 to 6, wherein a data 
amount is reduced by use of complex conjugate properties of orthogonal 
transformation before performing a product sum calculation, and the data 
amount is restored after performing the product sum calculation. 

[Claim8] The image processing method of Claim 2 to 7, wherein the 
template image is enlarged/reduced to various sizes, and the evaluation 
vector of each size is subjected to addition processing. 

[Claim9] The image processing method of Claim 8, wherein for the 
template image, the addition processing of the evaluation vector is carried 
out after executing the step of compressing each evaluation vector so as to 



reduce the processing amount. 

[Claim 10] The image processing method of Claim 2 to 9, wherein the 
template image is an image of a typified face. 

[Claimll] The image processing method of Claim 2 to 10, wherein a 
peak pattern that makes a peak of the similarity value steep is prepared, 
and a result obtained by subjecting data of this peak pattern to orthogonal 
transformation is applied to the product sum calculation. 

[Claim 12] The image processing method of Claim 2 to 10, wherein a 
mask pattern that depends on the template image is formed, and a result 
obtained by subjecting data of this mask pattern to orthogonal 
transformation is applied to the product sum calculation. 

[Claiml3] The image processing method of Claim 12, wherein the said 
mask pattern shows an average of a number of pixels in an image of the 
template image. 

[Claim 14] The image processing method of Claim 2 to 12, further 
comprising a step of, for the template image, processing positive and 
negative signs of the evaluation vector of the original template image and 



generating an evaluation vector of a bilaterally symmetrical image with 
respect to the original template image, by which the generated evaluation 
vector is applied to the product sum calculation. 

[Claim 15] The image processing method of Claim 10, wherein a map 
of point biserial correlation coefficients is generated on the basis of an 
extracted face image, and a position of the face part is calculated. 

[Claim 16] The image processing method of Claim 10, wherein a 
distribution of projection values in a y-direction is calculated on the basis 
of the extracted face image by use of the mask pattern, and two maximum 
points are calculated from this distribution, and an range between these 
maximum points is output as a mouth range. 

[Claim 17] The image processing method of Claim 10, wherein the 
input image is divided into only the face image and parts other than the 
face image on the basis of the extracted face image, and a digital 
watermark is embedded only into the face image, and the face image into 
which the digital watermark has been embedded and parts other than the 
face image are combined together and are output. 



[Claim 18] The image processing method of Claim 10, wherein the 
input image is divided into only the face image and parts other than the 
face image on the basis of the extracted face image, and only the face 
image is edited, and the face image that has been edited and parts other 
than the face image are combined together and are output. 
[Claim 19] An image processing apparatus comprising: 

a template image processing part for inputting a template image 
and calculating an edge normal direction vector of the template image, 
generating an evaluation vector from the edge normal direction vector, 
subjecting the evaluation vector to orthogonal transformation, and 
compressing the evaluation vector that has been subjected to the 
orthogonal transformation so as to reduce the processing amount; 

an input image processing part for inputting an input image and 
calculating an edge normal direction vector of the input image, generating 
an evaluation vector from the edge normal direction vector, subjecting 
the evaluation vector to orthogonal transformation, and compressing the 
evaluation vector that has been subjected to the orthogonal transformation 



so as to reduce the processing amount; 

multiplication means for performing a product sum calculation of 
corresponding spectral data about each evaluation vector that has been 
subjected to the orthogonal transformation and has been obtained for the 
template image and the input image; and 

inverse orthogonal transformation means for subjecting a result of 
the product sum calculation to inverse orthogonal transformation and 
generating a map of similarity values; 

wherein the evaluation vector includes a component in which an 
edge normal direction vector of a specified image undergoes 
even-numbered times angular transformation, and a formula of the 
similarity values, the orthogonal transformation, and the inverse 
orthogonal transformation each have linearity. 

[Claim20] The image processing apparatus of Claim 19, wherein the 
said template image processing part is provided with a recording means 
for recording the evaluation vector that has been compressed to reduce a 
processing amount and that has been subjected to orthogonal 
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transformation, and a result obtained by compressing the evaluation vector 
that has been subjected to orthogonal transformation is stored in the 
recording means before inputting the input image. 

[Claim21] The image processing apparatus of Claim 19, further 
comprising: a conjugate compression means, provided between the said 
recording means and the said multiplication means, for reducing the data 
amount by use of complex conjugate properties of orthogonal 
transformation; and a conjugate restoring means, provided between the 
said multiplication means and the said inverse orthogonal transformation 
means, for restoring the data amount reduced by use of the complex 
conjugate properties of orthogonal transformation. 

[Claim22] The image processing apparatus of Claim 19 to 21, further 
comprising: an enlargement/reduction means for enlarging/reducing the 
template image to various sizes; and an addition means for performing 
addition processing of the evaluation vector of each size. 

[Claim23] The image processing apparatus of Claim 22, wherein the 
said addition means performs addition processing of the evaluation vector 
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of the template image after compressing the vector so as to reduce the 
processing amount. 

[Claim24] The image processing apparatus of Claim 19, further 
comprising a peak pattern processing part for subjecting a peak pattern by 
which a peak of a similarity value is made steep to orthogonal 
transformation and compressing the peak pattern that has been subjected 
to the orthogonal transformation so as to reduce the processing amount, 
wherein a result obtained by subjecting data of this peak pattern to the 
orthogonal transformation is applied to a product sum calculation of the 
said multiplication means. 

[Claim25] The image processing apparatus of Claim 19, further 
comprising a mask pattern processing part for forming a mask pattern that 
depends on the template image and generating data obtained by subjecting 
data of this mask pattern to orthogonal transformation and by compressing 
it, wherein a processing result of the said mask pattern processing part is 
applied to a product sum calculation of the said multiplication means. 
[Claim26] The image processing apparatus of Claim 25, wherein the 



said mask pattern shows a mean of the number of pixels inside an image 
of the template image. 

[Claim27] The image processing apparatus of Claim 20, further 
comprising a symmetric vector generation means for processing positive 
and negative signs of the evaluation vector of an original template image 
recorded in the said recording means, and generating an evaluation vector 
of a bilaterally symmetric image with respect to the original template 
image, wherein the evaluation vector generated by the symmetric vector 
generation means is applied to a product sum calculation of the said 
multiplication means. 

[Claim28] The image processing apparatus of Claim 19, further 
comprising a rjmap forming means for forming a map of a point biserial 
correlation coefficient on the basis of an extracted face image, and an 
extraction means for calculating a position of a face part from the formed 
map. 

[Claim29] The image processing apparatus of Claim 19, further 
comprising a maximum point extraction means for calculating a projection 
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value distribution in a y direction by use of a mask pattern on the basis of 
an extracted face image, and calculating two maximum points from this 
distribution, and outputting a range between the maximum points such as a 
mouth range. 

[Claim30] The image processing apparatus of Claim 19, further 
comprising: a face image cutting-out means for separating an input image 
into only a face image and parts excluding the face image on the basis of 
an extracted face image; a digital watermark embedding means for 
embedding a digital watermark only into the face image; and an image 
synthesizing means for combining the face image into which the digital 
watermark has been embedded with parts excluding the face image and 
outputting them. 

[Claim31] The image processing apparatus of Claim 19, comprising a 
face image cutting-out means for separating an input image into only a 
face image and parts excluding the face image on the basis of an extracted 
face image; an image edge correction means for editing only the face 
image; and an image synthesizing means for combining an edited face 



13 



image with parts excluding the face image and outputting them. 
[DETAILED DESCRIPTION] 
[0 0 0 1] 
[TECHNICAL FIELD] 

The present invention relates to an image processing method for 
detecting an object from an input image by use of a template image, and 
relates to an image processing apparatus therefor. 
[0 0 0 2] 
[PRIOR ART] 

Conventionally, a technique is well known in which a template 
image is pre-registered, and the position in an input image of an image 
similar to the template image is detected by pattern matching between the 
input image and the template image. 
[0 0 0 3] 

However, since distorted perceptions are liable to be shaped 
according to how the background of the image similar to the template 
image is formed, Japanese Published Unexamined Patent Application No. 
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Hei-5-28273 discloses a technique that has been developed to solve this 
problem. 

[0 0 0 4] 

In this publication, a similarity value between the template image 
and the image corresponding to the template image is defined by the 
following mathematical formula. 
[Formula l] 

Cv: Correlation coefficient (similarity value) 
M: Number of pixels of template image in x direction 
N: Number of pixels of template image in y direction 
Sx: Derivative value of input image S in x direction 
Sy: Derivative value of input image S in y direction 
Tx: Derivative value of template image T in x direction 
Ty: Derivative value of template image T in y direction 

In detail, an inner product (cosG) of an angle 0 between a normal 
direction vector of the edge of the template image and a normal direction 
vector of the edge of the input image is a component of the similarity 
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value. 

[0 0 0 5] 

[TECHNICAL PROBLEM] 

However, there is a problem in that, as described later in detail, if 
the brightness of a background periphery of an image of an object is 
uneven, the positive and negative of the inner product are reversed, so that 
the similarity value becomes unsuitable for the real image, and distorted 
perceptions are easily produced, thus making it difficult to obtain a 
desirable recognition result. 
[0 0 0 6] 

Additionally, the similarity value formula is nonlinear with respect 
to the normal direction vectors of the edges of the input and template 
images, and processing for the template image and processing for the input 
image must be performed simultaneously. 
[0 0 0 7] 

Further, the template image is scanned on the input image, and a 
correlated calculation of the input image and the reference image must be 
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performed for each scanning point, and therefore, in practicality, a 
resulting expanding calculation amount makes real-time processing 
impossible. 

[0 0 0 8] 

It is therefore an object of the present invention to provide an 
image processing method and an image processing apparatus capable of 
obtaining an accurate, clear recognition result and capable of performing 
high speed processing. 
[0 0 0 9] 
[MEANS] 

In an image processing method according to a first aspect of the 
present invention, matching between a template image and an input image 
is evaluated by use of a similarity value map, and an evaluation vector is 
generated for each of the template and input images, and the evaluation 
vector includes a component in which a normal direction vector of an edge 
of a specified image undergoes even-numbered times angular 
transformation. 



[0 0 10] 

With this structure, an image processing method capable of 
obtaining a clear, accurate recognition result and capable of performing 
high speed processing can be realized. 
[0011] 

[EMBODIMENTS OF THE INVENTION] 

In an image processing method according to a first aspect of the 
present invention, matching between a template image and an input image 
is evaluated by use of a similarity value map, and an evaluation vector is 
generated for each of the template and input images, and the evaluation 
vector includes a component in which a normal direction vector of an edge 
of a specified image undergoes even-numbered times angular 
transformation. 

[0012] 

With this structure, the matching therebetween can be properly 
evaluated with no influence on the similarity value even in a case in which 
the positive and negative of an inner product (cos0) of an angle 0 between 
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a normal direction vector of the edge of the template image and a normal 
direction vector of the edge of the input image are reversed because of 
unevenness in brightness of its background. 
[0013] 

An image processing method according to a second aspect of the 
present invention includes a step of inputting a specified image for each of 
a template image and an input image and calculating a normal direction 
vector of an edge of the specified image; a step of generating an evaluation 
vector from the edge normal direction vector; a step of subjecting the 
evaluation vector to an orthogonal transformation; a step of performing a 
product sum calculation of corresponding spectral data for each evaluation 
vector, which has been subjected to orthogonal transformation, obtained 
for the template image and input image; and a step of subjecting a result of 
the product sum calculation to an inverse orthogonal transformation and 
generating a map of similarity values; in which a formula of the similarity 
values, the orthogonal transformation, and the inverse orthogonal 
transformation each have linearity. 



[0 0 14] 

With this structure, a Fourier transformation value of the template 
image and a Fourier transformation value of the input image do not need 
to be simultaneously calculated. In other words, the Fourier 
transformation value of the template image can be obtained prior to that of 
the input image, thus making it possible to lighten the processing burden 
and improve processing speed. 
[0 0 15] 

An image processing method according to a third aspect of the 
present invention includes a step of compressing each evaluation vector, 
which has been subjected to the orthogonal transformation, so as to reduce 
the processing amount. 
[0016] 

With this structure, what is processed is limited only to an effective 
component (e.g., low-frequency component), and processing speed can be 
improved. 

[001 7 ] 



In an image processing method according to a fourth aspect of the 
present invention, for the template image, the steps taken until the 
evaluation vector that has been subjected to the orthogonal transformation 
is compressed are executed before the input image is input, and its result is 
stored in a recording unit. 
[0018] 

With this structure, the processing relating to the template image is 
completed merely by reading from the recording unit, and processing 
speed can be improved. 
[0019] 

In an image processing method according to a fifth aspect of the 
present invention, the evaluation vector is normalized with respect to a 
vector length. 

[0 0 2 0] 

With this structure, the stability of pattern extraction can be 
improved without undergoing length variations though the strength of the 
edge of the input image varies, and the vector length varies according to a 



photographic condition. 
[0 0 2 1 ] 

In an image processing method according to a sixth aspect of the 
present invention, the evaluation vector of the template image is 
normalized by the number of edge normal direction vectors. 

[0 0 2 2] 

Therefore, independent of whether the number of edges of the 
template image is large or small, a similarity can be evaluated on the same 
scale by dividing it by n and normalizing it. 
[0 0 2 3] 

In an image processing method according to a seventh aspect of the 
present invention, a data amount is reduced by use of complex conjugate 
properties of an orthogonal transformation before performing a product 
sum calculation, and the data amount is restored after performing the 
product sum calculation. 
[0 0 2 4] 

With this structure, the data amount can be greatly reduced to 
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improve processing speed, and memory capacity can be saved. 
[0 0 2 5] 

In an image processing method according to an eighth aspect of the 
present invention, the template image is enlarged/reduced to various sizes, 
and the evaluation vector of each size is subjected to addition processing. 
[0 0 2 6] 

With this structure, matching does not need to be repeatedly carried 
out for each size, and processing speed can be improved. 
[0 0 2 7] 

In an image processing method according to a ninth aspect of the 
present invention, for the template image, the addition processing of the 
evaluation vector is carried out after executing the step of compressing 
each evaluation vector so as to reduce the processing amount. 
[0 0 2 8] 

With this structure, what is subjected to addition processing can be 
reduced, and processing speed can be improved. 
[0 0 2 9] 



In an image processing method according to a tenth aspect of the 
present invention, the template image is an image of a typified face. 
[0 0 3 0] 

With this structure, not only the total position of a face but also the 
partial position of a main face part, such as the eyes, nose, or mouth, can 
be recognized. 

[0 0 3 1] 

In an image processing method according to an 11th aspect of the 
present invention, a peak pattern that makes a peak of the similarity value 
steep is prepared, and a result obtained by subjecting data of this peak 
pattern to an orthogonal transformation is applied to the product sum 
calculation. 

[0 0 3 2] 

With this structure, a part similar to a template can be detected 
from the input image more clearly and stably while reflecting the peak 
pattern. 

[0 0 3 3] 



In an image processing method according to a 12th aspect of the 
present invention, a mask pattern that depends on the template image is 
formed, and a result obtained by subjecting data of this mask pattern to an 
orthogonal transformation is applied to the product sum calculation. 
[0 0 34] 

With this structure, closer detection can be performed while adding 
attributes other than the shape of the template image. 
[0 0 3 5] 

In an image processing method according to a 13th aspect of the 
present invention, the mask pattern shows an average of a number of 
pixels in an image of the template image. 
[0 0 3 6] 

With this structure, attributes of the template image can be reflected 
by a simple mask pattern. 
[0 0 3 7] 

An image processing method according to a 14th aspect of the 
present invention further includes a step of, for the template image, 



processing positive and negative signs of the evaluation vector of the 
original template image and generating an evaluation vector of a 
bilaterally symmetrical image with respect to the original template image, 
by which the generated evaluation vector is applied to the product sum 
calculation. 

[0 0 3 8] 

With this structure, the recording amount of the template image can 
be saved, and the evaluation vector of the template image that has been 
bilaterally reversed can be generated without direct calculation, thus 
making it possible to improve processing speed. 
[0 0 3 9] 

In an image processing method according to a 15th aspect of the 
present invention, a map of point biserial correlation coefficients is 
generated on the basis of an extracted face image, and a position of the 
face part is calculated. 

[0 0 4 0] 

With this structure, the position of the face part can be specified 
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more accurately. 
[0 04 1] 

In an image processing method according to a 16th aspect of the 
present invention, a distribution of projection values in a y-direction is 
calculated on the basis of the extracted face image by use of the mask 
pattern, and two maximum points are calculated from this distribution, and 
an extent between these maximum points is output as a mouth range. 
[0 0 4 2] 

With this structure, the mouth range can be specified more 
accurately. 

[0 04 3] 

In an image processing method according to a 17th aspect of the 
present invention, the input image is divided into only the face image and 
parts other than the face image on the basis of the extracted face image, 
and a digital watermark is embedded only into the face image, and the face 
image into which the digital watermark has been embedded and parts other 
than the face image are combined and output. 
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[0 04 4] 

With this structure, watermark data can be concentratively 
embedded into the face part that is liable to be falsified. 
[0 0 4 5] 

In an image processing method according to an 18th aspect of the 
present invention, the input image is divided into only the face image and 
parts other than the face image on the basis of the extracted face image, 
and only the face image is edited, and the face image that has been edited 
and parts other than the face image are combined and output. 
[0 04 6] 

With this structure, only the face image can be corrected without 
exerting an influence on parts other than the face. 

[0 04 7] 
(Embodiment!) 

Referring to Fig. 1 through Fig. 8, the basic form of an image 
processing apparatus of the present invention will be described. 
[0 0 4 8] 
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As shown in Fig. 1, the image processing apparatus according to 
Embodiment 1 includes two different processing parts, i.e., a template 
image processing part 100 and an input image processing part 200, and 
evaluates matching between a template image and an input image by use 
of a map of similarity values L. In this image processing apparatus, both 
the template image processing part 100 and the input image processing 
part 200 perform orthogonal transformation with linearity, then perform 
multiplication, and calculate the similarity values L by inverse orthogonal 
transformation. 

[0 0 4 9] 

It should be noted that FFT (Fast Discrete Fourier Transformation) 
is used as an orthogonal transformation in all embodiments described later. 
However, not only FFT but also Hartley transformation or number 
theoretic transformation can be used, and "Fourier transformation" in the 
following description can be replaced by Hartley transformation or 
number theoretic transformation. 
[0 0 5 0] 



Further, both the template image processing part 100 and the input 
image processing part 200 employ an inner product of edge normal 
direction vectors so that a correlation becomes proportionately higher with 
proximity between the directions of the edge normal direction vectors. 
This inner product is evaluated by use of an even-numbered times angle 
expression. For convenience, only a double size angle will be hereinafter 
described as an example of even-numbered times angle, but the same 
effect with the present invention can be produced in other even-numbered 
times angles such as a quadruple size or a sextuple size. 
[0 0 5 1] 

Next, the template image processing part 100 will be described. 
An edge extraction unit 1 applies differential processing (edge extraction) 
to a template image in x and y directions, and outputs an edge normal 
direction vector of the template image. 
[0 0 5 2] 

In this embodiment, a Sobel filter of [Formula 2] is used for the x 
direction, and a Sobel filter of [Formula 3] is used for the y direction. 
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[0 0 5 3] 

An edge normal direction vector of the template image defined by 
the following formula is obtained with these filters. 
[Formula 4] 

This embodiment takes an example in which an image of a person 
in a specific posture who is walking through a pedestrian crossing is 
extracted from an input image of the vicinity of the pedestrian crossing. 
[0 0 5 4] 

The template image of the person is that of Fig. 2(a), for example. 
If the filter processing of Formula 2 is applied to the template image of Fig. 
2(a), a result (x component) as shown in Fig. 2(b) will be obtained, and, if 
the filter processing of Formula 3 is applied to the template image of Fig. 
2(a), a result (y component) as shown in Fig. 2(c) will be obtained. 
[0 0 5 5] 

An evaluation vector generation unit 2 inputs an edge normal 
direction vector of the template image from the edge extraction unit 1, and 
outputs the evaluation vector of the template image to an orthogonal 
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transformation unit 3 through processings described later. 
[0 0 5 6] 

First, the evaluation vector generation unit 2 normalizes the length 
of the edge normal direction vector of the template image according to the 
following formula. 
[Formula 5] 

Generally, the edge strength of the input image varies according to 
a photographic condition. However, an angular difference between the 
edges of the input and template images (or, a subordinate function value 
that varies according to this angular difference) is unsusceptible to the 
photographic condition. 
[0 0 5 7] 

Therefore, in the present invention, the edge normal direction 
vector of the input image is normalized to be 1 in length in the input image 
processing part 200, as described later. According to this, the edge 
normal direction vector of the template image is normalized to be 1 in 

j 

length in the template image processing part 100. 



32 



[0 0 5 8] 

As a result, the stability of pattern extraction can be improved. 
Usually, it is desirable to have a normalization length of f~lj 9 but other 
constants can be used. 

[0 0 5 9] 

Concerning a trigonometric function, the following double size 
formula is established as is well known. 
[Formula 6] 

The evaluation vector generation unit 2 calculates the evaluation 
vector of the template image defined by the following formula. 
[Formula 7] 

When assumed to be a threshold (for fine edge removal), the evaluation 

vector V of the template image is: 

if 

else 

wherein n is the number of > a. 

Formula 7 will be described. The reason why a vector smaller 
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than constant "a" is set as a 0 vector is to remove a noise, for example. 
[0 0 6 0] 

Next, a description will be given of the fact that x and y 
components of this evaluation vector are each divided by "n" for 
normalization. 

[0 0 6 1] 

Generally, the shape of the template image is arbitrary, and the 
shape of the edge is various. For example, there is a situation in which 
the number of edges is small as shown in Fig. 8(a), or the number of edges 
is larger (than that of Fig. 8(a)) as shown in Fig. 8(b). Therefore, in this 
embodiment, the similarity is intended to be evaluated on the same scale 
by dividing it by "n" for normalization independent of whether the number 
of the edges is small or large. 
_ [0 0 6 2] 

However, the normalizing processing of dividing it by "n" does not 
necessarily need to be performed, and normalizing processing can be 
omitted in a situation in which only one template image is used or in 
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which only template images the same as the number of edges are used. 
[0 0 6 3] 

Next, a description will be given of the fact that the x and y 
components of Formula 7 serve as a subordination function of the 
cosine/sine for a double size of the x and y components of Formula 5. 

[0 0 6 4] 

If the inner product cos0 of an angle 9 between an evaluation vector 
T of the template image and an evaluation vector I of the input image is 
used as a similarity scale as in the conventional technique, the following 
problem will arise. 
[0 0 6 5] 

For example, let us suppose that the template image is that of Fig. 
7(a), and the input image is that of Fig. 7(b). In the background of Fig. 
7(b), the left background part of an image of a specified object is brighter 
than the image of the specified object, and the right background part of the 
image of the specified object is darker than the image of the specified 
object. 



35 



[0 0 6 6] 

If considered only from the images, the images of the specified 
object completely coincide with each other when the center of the template 
image of Fig. 7(a) coincide with the center of the input image of Fig. 7(b), 
and therefore the similarity value must be the maximum at this time. 
Herein, if the edge normal direction vector that is directed outward from 
the image of the object is positive, the edge normal direction vector must 
be directed in the same direction (outward/inward) in the brighter 
background part of Fig. 7(b) and in the darker background part thereof 
when deserving from the image of the specified object. 
[0 0 6 7] 

However, if the brightness of the background of Fig. 7(b) is uneven 
in the right and left background parts with the specified object 
therebetween at this time, the directions are opposed as shown by arrows 
in Fig. 7(b) (i.e., outward from the specified object in the brighter 
background part, and inward from the specified object in the darker 
background part). 



[0 0 6 8] 

Originally, it is to reach the maximum similarity value, but the 
similarity value does not necessarily become high in this case, and thereby 
a distorted perception is easily shaped. 

[0 0 6 9] 

A detailed description of the above problem will be given with 
reference to Fig. 6. In the case an angle between the evaluation vector T 
of the template image and the evaluation vector I of the input image is 
represented as 0, and its inner product, i.e., cos0 is used as a similarity 
value, the direction of the evaluation vector I of the input image has two 
possibilities, i.e., I of Fig. 6 and I' precisely opposite thereto because of 
unevenness in brightness of the background that exists in the periphery of 
the image of the specified object, as described above. 
[0 0 7 0] 

Therefore, the inner product used as a similarity scale will have two 
products, i.e., cosG and cosG'. 
[0 0 7 1 ] 
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Herein, 9+6' = n, and cosG' = cos(tc-0) =- cosG. 
[0 0 7 2] 

Therefore, if cos6 is used as the similarity scale, a similarity value 
will be oppositely reduced when there is a need to act properly to increase 
the similarity value, and the similarity value will be oppositely increased 
when there is a need to act properly to reduce the similarity value. 
[0 0 7 3] 

In other words, according to the prior art similarity value, matching 
between the template image and the input image cannot be correctly 
evaluated. As a result, disadvantageously, a wrong recognition is easily 
formed by the conventional technique, and a recognition result will not be 
clear even if images are shaped. 
[0 0 7 4] 

Therefore, in the present invention, a cosine that is a double size of 
0, i.e., cos(20) is used for the formula of the similarity value. Thereby, 
cos(20') = cos(20) from the double size formula of Formula 6 even if 
cos0'=-cos0. In other words, the similarity value rises irrespective of the 
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background when there is a need to act to increase the similarity value. 
Therefore, the matching therebetween can be correctly evaluated in spite 
of the unevenness in brightness of the background. This applies to the 
quadruple size or the sextuple size as well as the double size. 
[0 0 7 5] 

Moreover, the properties of the similarity value monotonously 
decreasing with respect to angle 6 do not change even if an 
even-numbered times angle cosine is used. 
[0 0 7 6] 

Therefore, according to the present invention, patterns can be stably 
extracted by an even-numbered times angle evaluation regardless of the 
brightness condition of the background. 
[0 0 7 7] 

More specifically, the similarity value is defined by the following 
formula in the present invention. 

[Formula 8] 
Similarity value 



Evaluation vector of input image 
Evaluation vector of template image 

Since Formula 8 consists only of addition and multiplication, the 
similarity value is linear for each evaluation vector of the input and 
template images. Therefore, when Formula 8 is subjected to Fourier 
transformation, Formula 9 is obtained by a discrete correlation theorem of 
Fourier transformation (reference: Fast Fourier Transformation translated 
by Hiroshi Miyakawa published by Science and Technology). 

[Formula 9] 
Fourier transformation values of Kx, Ky 
Fourier transformation complex conjugate of Vx, Vy 

Hereinafter, Fourier transformation value represented as ~, and complex 
conjugate represented as *. 
[0 0 7 8] 

The similarity value of Formula 8 is obtained by subjecting 
Formula 9 to inverse Fourier transformation. From careful consideration 
of Formula 9, the following two respects will become apparent. 



[0 0 7 9] 

First, (1) the Fourier transformation value of the template image 
and the Fourier transformation value of the input image can be simply 
multiplied together in a transformation value that has been subjected to 
orthogonal transformation. 

[0 0 8 0] 

Secondly, (2) the Fourier transformation value of the template 
image and the Fourier transformation value of the input image do not need 
to be simultaneously calculated, and the Fourier transformation value of 
the template image may be calculated prior to that of the input image. 
[0 0 8 1 ] 

Therefore, in this embodiment, the template image processing part 
100 is provided with a recording unit 5 to record the output of a 
compression unit 4 before inputting the input image. Thereby, after the 
input image is input to the input image processing part 200, the template 
image processing part 100 has no need to process the template image, and 
the processing performance can be concentrated on processing subsequent 



to the input image processing part 200 and a multiplication unit 10, thus 
making it possible to improve the processing speed. 
[0 0 8 2] 

Next, a description will be given of a structure subsequent to the 
evaluation vector generation unit 2. As shown in Fig. 1, in the template 
image processing part 100, the evaluation vector of the template image 
that is output from the evaluation vector generation unit 2 is subjected to 
Fourier transformation by the orthogonal transformation unit 3, and is 
output to the compression unit 4. 
[0 0 8 3] 

The compression unit 4 reduced the evaluation vector that has been 
subjected to Fourier transformation, and stores it in the recording unit 5. 
As shown in Fig. 3, the evaluation vector subjected thereto includes 
various frequency components in both x and y directions. By 
experiments carried out by the present inventors, it is known that sufficient 
accuracy can be obtained by processing the low frequency components 
(e.g., halves on the low frequency side in the x and y directions, 



42 



respectively) without processing all of the frequency components. In Fig. 
3, the range (-a ^ x ^ a,-b ^ y ^ b) having no oblique line is an original one, 
and the range (-a/2 ^ x ^ a/2,-b/2^y^b/2) having oblique lines is the 

one that has been reduced. That is, the processing amount becomes 1/4 
times as much as before. 
[0 0 8 4] 

Thereby, an object to be processed can be reduced, and the 
processing speed can be improved. 
[0 0 8 5] 

The compression unit 4 and the recording unit 5 can be omitted 
when a data amount is small, or high-speed processing is not required. 
[0 0 8 6] 

Next, the input image processing part 200 will be described. The 
input image processing part 200 performs almost the same processing as 
the template image processing part 100. That is, according to Formula 2 
and Formula 3, the edge extraction unit 6 outputs an edge normal direction 
vector of the input image defined by the following formula. 



[Formula 10] 

Edge normal direction vector of input image) 
Ix: Derivative value of input image in x direction 
Iy: Derivative value of input image in y direction 

The evaluation vector generation unit 7 inputs the edge normal 
direction vector of the input image from the edge extraction unit 6, and 
outputs an evaluation vector of the input image defined by the following 
two formulas. 
[Formula 11] 

Length normalized vector of input image ) 
[Formula 12] 

When assumed to be a threshold (for fine edge removal), the evaluation 

vector K of the inputimage is: 

if 

else 

The input image processing part 200 differs from the template 
image processing part 100 only in the fact that normalizing processing of 
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the division by "n" is not performed. In other words, evaluation by the 
even-numbered times angle* the normalizing processing to 1 in length, and 
the noise removal processing are performed in the same way as in the 
template image processing part 100. 
[0 0 8 7] 

Next, a structure subsequent to the evaluation vector generation 
unit 7 will be described. As shown in Fig. 1, in the input image 
processing part 200, the evaluation vector of the input image that is output 
from the evaluation vector generation unit 7 is subjected to Fourier 
transformation by the orthogonal transformation unit 8, and is output to 
the compression unit 9. 
[0 0 8 8] 

The compression unit 9 reduces the evaluation vector that has been 
subjected to Fourier transformation, and outputs it to the multiplication 
unit 10. Herein, the compression unit 9 reduces the object to be 
processed to the same frequency band as the compression unit 4 (e.g., 
halves on the low frequency side in the x and y directions, respectively, in 



this embodiment). 
[0 0 8 9] 

The compression unit 9 can be, of course, omitted when the data 
amount is small, or when high-speed processing is not required, and the 
compression unit 9 is likewise omitted when the compression unit 4 is 
omitted in the template image processing part 100. 
[0 0 9 0] 

Next, the multiplication unit 10 and other construction subsequent 
to this will be described. When the processing in the template image 
processing part 100 and in the input image processing part 200 is 
completed, the multiplication unit 10 inputs a Fourier transformation value 
of each evaluation vector of the template and inputs images from the 
recording unit 5 and the compression unit 9. 
[0 0 9 1 ] 

Thereafter, the recording unit 5 performs a product-sum-calculation 
according to Formula 9, and outputs the result (i.e., Fourier transformation 
value of the similarity value L) to an inverse orthogonal transformation 
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unit 11. 

[0 0 9 2] 

The inverse orthogonal transformation unit 11 subjects the Fourier 
transformation value of the similarity value L to inverse Fourier 
transformation, and outputs the map L(x,y) of the similarity value L to a 
map processing unit 12. The map processing unit 12 extracts a 
high-value point (peak) from the map L(x,y), and outputs its position and 
its value. The map processing unit 12 and the construction subsequent to 
this can be freely arranged if necessary. 
[0 0 9 3] 

Next, a processing example by the template image of Fig. 2 will be 
described with reference to Fig. 4 and Fig. 5. If the input image is as 
shown in Fig. 4(a), the edge extraction unit 6 extracts an edge component 
in the x direction as shown in Fig. 4(b), and extracts an edge component in 
the y direction as shown in Fig. 4(c). 
[0 0 94] 

As a result of the aforementioned processing, a similarity value 
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map L(x,y ) shown in Fig. 5(a) is obtained. Herein, the front end of the 
arrow indicated as the "maximum value" is a peak of this map, and, as is 
apparent from a comparison with the input image of Fig. 5(b), it is 
understood that the correct point can be clearly recognized by only one 
point. 

[0 0 9 5] 

In the conventional technique, the calculation must be performed 
by the number of times of the product, i.e., 2AB in order to successively 
scan the template image on input images and to obtain pyo of Formula 1 at 
each position, wherein the size of the input image is A(=2y) and the size of 
the template image is B. Herein, the number of calculations is evaluated 
by the number of times of the product with high calculation costs. 
[0 0 9 6] 

In contrast, in this embodiment, FFT is performed twice by the 
orthogonal transformation unit 3 and 8, the product sum calculation is then 
performed by the multiplication unit 10, and inverse FFT is performed 
once by the inverse orthogonal transformation unit 11. The number of 
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calculations is merely the number of times of the product of 
3{(2y-4)A+4}+2A 
[0 0 9 7] 

In a comparison of the number of calculations therebetween, the 
number of calculations of the product according to this embodiment 
becomes about 1/100 of the number of calculations of the product 
according to the conventional technique, if A=256* 256=2 16 , and B=60><60. 
As a result, unusually high-speed processing can be carried out. 
[0 0 9 8] 

The discrete correlation theorem of Fourier transformation cannot 
be used in such a nonlinear formula as Formula 1 of the conventional 
technique. 

[0 0 9 9] 

Therefore, in the conventional technique, processing for the 
template image cannot be performed prior to that for the input image as 
shown in Fig. 1 of this embodiment. In other words, in the conventional 
technique, both the template image and the input image must be 



simultaneously processed. Also in this respect, the processing speed in 
this embodiment becomes higher than that in the conventional technique. 
[OlOO] 

(Embodiment 2) 

In this embodiment, a conjugate compression unit 13 and a 
conjugate restoring unit 14 are added to the elements of Fig. 1 as shown in 
Fig. 9. The conjugate compression unit 13 further halves the Fourier 
transformation value of the evaluation vector of the template image that 
has been read from the recording unit 5 by use of complex conjugate 
properties of Fourier transformation. 
[0101] 

This respect will be described. The following formula is 
established for the spectrum obtained by real number Fourier 

transformation. 

[Formula 13] 

In detail, a spectral value at a specific point is equal to the complex 
conjugate of a spectral value at a position symmetrical to that point in a uv 
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coordinate system. Use of this property makes it possible to reduce the 
processing amount by half of the uv coordinate system as shown in Fig. 
10(a). 

[0102] 

Since the compression unit 4 and the compression unit 9 further 
compress it, the data amount can be reduced to 1/8 of the original data as 
shown in Fig. 10(b). Therefore, the processing speed can be improved, 
and memory capacity can be saved. 
[0103] 

As a result, in this embodiment, the Fourier transformation value 
that the conjugate compression unit 13 has read from the recording unit 5 
is halved, and is output to the multiplication unit 10. Thereafter, the 
conjugate reconstruction unit 14 performs processing by which the output 
from the multiplication unit 10 is doubled, and outputs it to the inverse 
orthogonal transformation unit 1 1 . 
[0 10 4] 

These constitute the basis of the present invention. A useful 
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technique will be hereinafter disclosed by application of this basis. 
[0105] 

(Embodiment 3) 

In this embodiment, a first example and a second example will be 
described. A technique for performing face extraction by using an 
enlarged/reduced template image is disclosed in either example. 
According to this technique, processing by use of similar templates 
different in size can be carried out efficiently and at high speed. The 
application of this technique is not limited only to face extraction. 
[0106] 

TFirst Example] 

Fig. 11(a) is a block diagram of a template image processing part 
according to Embodiment 3 (first example) of the present invention. 
Some modifications have been made to the template image processing part 
100 of Fig. 1 or Fig. 9 as shown in the figure. 
[0 10 7] 

First, when a template image is input to a template image 
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processing part 101, the image is enlarged/reduced by a 
enlargement/reduction unit 15. Thereafter, the enlarged/reduced template 
image is output to an edge extraction unit 1 . 
[0108] 

When an evaluation vector generation unit 2 outputs the evaluation 
vector of the template image, an addition unit 16 applies addition 
processing to this, and data obtained by the addition processing is output 
to an orthogonal transformation unit 3. 
[0109] 

The addition unit 16 performs addition processing for each size 
range according to the following formula. 

[Formula 14] 
Evaluation vector of template image 
wherein the size range is a~b. 

Although image processing is carried out by use of one template 
image in Embodiments 1 and 2, templates with multiple sizes are 
processed by use of a plurality of templates in this embodiment. 
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[0 110] 

Moreover, the sizes of the templates are each separated into a 
specific range, and a template processing result is superimposed thereon. 
[0111] 

Fig. 12 shows an example of an input image that includes a face 
image of a person. In this example, a template image of a straight face, 
such as that of Fig. 13(a), is prepared. A template image of a face 
inclined at a specific angle as shown in Fig. 13(d) is further prepared. 
The degree to which the face to be prepared is inclined can be 
appropriately selected. 
[0 112] 

When the edges of the template images of Figs. 13(a) and 13(d) are 
extracted, appearances shown in Fig. 13(b) and 13(e), respectively, are 
obtained in this embodiment. Further, when the image of Fig. 13(a) or 
13(d) is input, an evaluation vector corresponding to the edge image is 
generated as shown in Fig. 13(c) or 13(f). 
[0 113] 
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As in Embodiments 1 and 2, data concerning the template images 
are stored in the recording unit 5. 
[0 1 14] 

When the input image is input, a procedure subsequent to those of 
the input image processing part 200 and the multiplication unit 10 is 
performed in the same way as in Embodiments 1 and 2. 

[0 115] 

Corresponding similarity value maps are calculated for 
superimposed template data of all size ranges of the template images. 
[0116] 

Thereby, a detection result, such as that shown in Fig. 14, is 
obtained. As is apparent from Fig. 14, not only the overall position of the 
face but also its size and the positions of the main face parts, such as the 
eyes, nose, and mouth, can be recognized. 
[0 117] 

At this time, it is desirable to display the processing result on a 
display or to output it to a printer as shown in Fig. 14 in the stage 
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subsequent to the map processing unit 12. 
[0 118] 

According to First Example, the following effects can be obtained. 
(1) Even if template data in a specific size range is added, and image 
processing is performed with layered templates, a part similar to the 
template often shows a high similarity value. Therefore, when similarity 
values of all sizes are calculated, the processing of the product sum 
calculation part and the processing of the inverse orthogonal 
transformation part must be repeatedly performed NM times, wherein N is 
the number of the templates, and M is the number of all sizes. On the 
other hand, the number of processings reaches NM/H by use of the 
superimposed template, wherein H is the width of a superimposed range. 
Therefore, improvements in efficiency can be accomplished, and image 
processing can be carried out at a higher speed for face extraction, for 
example. 

[0 119] 

(2) Additionally, a rough candidate range of eyes, nose, and mouth 
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can be extracted not only merely by outputting the positional data for a 
face but also by superimposing the template on the face candidate position 
as shown in Fig. 14. 
[0 12 0] 

The positions of face parts, such the eyes, nose, and mouth, can 
also be more accurately extracted by processing the images in these ranges 
much more finely, as described in the following embodiments. 

[0121] 

r Second Example J 

In this example, a template image processing part 102 is 
constructed as shown in Fig. 11(b). That is, the position of the addition 
unit 16 is moved between the compression unit 4 and the recording unit 5, 
in comparison with the First Example. Thereby, the addition unit 16 is 
formed to perform addition processing according to the following formula. 
[Formula 15] 

wherein the size range is a~b. 

The similarity value formula that is linear before being subjected to 
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Fourier transformation is still linear after being subjected thereto. 
Therefore, the position of the addition unit 16 can be changed from the 
First Example to the Second Example, for example. 
[0 12 2] 

Thereby, an object to be processed by the addition unit 16 can be 
reduced more than in the First Example because the object is compressed 
by compression unit 4. Therefore, suitably, processing speed can be 
improved. 

[0123] 
(Embodiment 4) 

Embodiment 4 will be described with reference to Fig. 15 and Fig. 
16. This Embodiment discloses an efficient technique for turning the 
value of the maximum point of the similarity value map described in 
embodiments 1 and 2 into a stronger peak value. 
[0 12 4] 

Generally, in the similarity value map, a peak appears in a part 
overlapping with the template image. In this embodiment, a peak pattern 
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"p" around and including a maximum point is used as a filter for the 
similarity value map, and the value of a part similar to the peak pattern in 
the similarity value map is amplified. 
[0 12 5] 

As shown in Fig. 15, in this embodiment, a peak pattern processing 
part 300 is added to the structure of Fig. 1 showing embodiment 1. 
[0126] 

Fig. 16 shows a mask for this peak pattern. As indicated in Fig. 
16, in this peak pattern, normalization is made to set an average value at 0. 
[0127] 

In the peak pattern processing part 300, an orthogonal 
transformation unit 17 subjects this peak pattern to Fourier transformation, 
and a compression unit 18 compresses a Fourier transformation value so as 
to record compressed data in a recording unit 19. 
[0 12 8] 

Since the mask is used, the similarity value formula is established 
not by Formula 8 but by the following formula that reflects the mask. 
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[Formula 16] 

Peak pattern low frequency spectrum 
L(u,v): Similarity value before filtering 
M(u,v): Similarity value after filtering 

Therefore, the multiplication unit 10 reads data from the recording 
unit 5, the recording unit 19, and the compression unit 9, then performs a 
product-sum-calculation, and outputs the Fourier transformation value of 
the similarity value corrected by the peak pattern. 
[0129] 

A peak pattern filter can filter the similarity value map L according 
to the following formula, but, if so, a large amount of product sum 
calculations will be inefficiently needed. 
[Formula 17] 

In contrast, in this embodiment, simple, accurate processing is 
performed according to Formula 16 without performing a large amount of 
calculations like Formula 17. 
[0 13 0] 
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Therefore, according to this embodiment, the peak point of the 
similarity value map can be efficiently amplified. Additionally, a part 
similar to the template can be clearly and stably detected from the input 
image while reflecting the peak pattern. 

[0131] 
(Embodiment 5) 

In this embodiment, a pixel mean in the range of the template 
image as well as the similarity value between the edge of the input image 
and the edge of the template image is added to a similarity judgment. 
[0132] 

This structure is shown in Fig. 17. Like Fig. 15, a mask pattern 
processing part 400 is added to the structure of Fig. 1 of Embodiment 1. 
[0133] 

However, the structure of Fig. 17 differs from that of Fig. 15 in the 
fact that the mask pattern processing part 400 is used not to input a peak 
pattern but to input a template image, and that a mask pattern generation 
unit 20 is provided for generating a mask pattern that depends on this 
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image. 

[0 13 4] 

Like Fig. 15, the output of the mask pattern generation unit 20 is 
subjected to Fourier transformation by the orthogonal transformation unit 
21, is then compressed by the compression unit 22, and is recorded in the 
recording unit 23. 

[0 13 5] 

Further, since a mask is used, the similarity value formula is 
expressed not as Formula 8 but as the following formula that reflects the 
mask. 

[Formula 18] 

Q(x,y): Pixel-average-added similarity value 
q(l,m): Mask pattern 
L(x,y): Similarity value before filtering 
I(x,y): Input image data 

For the same reason mentioned in Embodiment 4, a large amount 
of product sum calculations will be inefficiently needed if multiplication is 
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performed in this way. 
[0 13 6] 

Formula 19 is obtained by subjecting this to Fourier transformation, 
and the calculation can be performed very simply. 
[Formula 19] 

[0 13 7] 

Therefore, the multiplication unit 10 performs a 
product-sum-calculation according to Formula 19. 
[0 13 8] 

Next, the relationship between the template image and the mask 
pattern will be described with reference to Fig. 18. Herein, in order to 
add a pixel mean in the range of the template image to a similarity 
judgment, the mask pattern generation unit 20 generates a mask pattern as 
shown in Fig. 18(b) for a template image as shown in Fig. 18(a) 
[0139] 

In greater detail, in the template image of Fig. 18(a), a value of 1/N 
is set at each point of the inner part (inside the circle) whose pixel mean is 
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to be calculated, and a value of 0 is set at other points. Herein, N is the 
number of points in the inner part, and a result of an addition of the value 
of all points of the mask pattern is 1 . 
[0140] 

According to this embodiment, a mean of pixel values inside the 
image can also be added to the similarity value, and a specified object can 
be extracted from the input image more accurately and more efficiently. 
[0141] 

A mean of a square value of each pixel can be calculated such that 
data in which each pixel of the input image is squared by the input image 
processing part is formed, and the same processing is applied thereto. 
[0 14 2] 

Therefore, a distributed value as well as a mean within a range can 
be efficiently calculated. 

[0 14 3] 
(Embodiment 6) 

This embodiment discloses a technique by which a bilaterally 
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symmetric image with respect to a template image can be efficiently 
processed. 

[0144] 

This structure is shown in Fig. 19. That is, in addition to the 
structure of Fig. 1 of Embodiment 1, a symmetric vector generation unit 
24 is provided between the recording unit 5 and the multiplication unit 10. 
As in Embodiment 1, Formula 8 is used as the similarity value formula in 
this embodiment. 
[0145] 

Next, a description will be given of how to treat a bilaterally 
symmetric template image. For example, if the template image of Fig. 
20(a) is an original image, a template image in which the original one has 
been bilaterally reversed is shown as in Fig. 20(b). 
[0146] 

The relationship between the edge normal direction vectors of these 
template images is expressed by the following formula. 
[Formula 20] 
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: Edge normal direction vector of image resulting from subjecting original 
template image to bilateral reversal 

The evaluation vector of the template image that has been 
bilaterally reversed is expressed by the following formula. 
[Formula 21] 

Evaluation vector of 
Evaluation vector of 

Concerning Fourier transformation, because of the relation of 
Formula 22, Formula 23 is obtained by applying Formula 22 to Formula 
21. 

[Formula 22 ] 
[Formula 23 ] 

In detail, the evaluation vector of the template image that has been 
bilaterally reversed can be easily generated by, for example, reversing the 
positive and negative of the evaluation vector of the original template 
image. 

[0147] 
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Therefore, the evaluation vector of the template image that has 
been bilaterally reversed can be obtained only by allowing the symmetric 
vector generation unit 24 to apply Formula 23 to the evaluation vector of 
the original template image of the recording unit 5 in Fig. 19. 

[0148] 

When simply assumed, there is no need to perform complex 
processing, such as a procedure in which the image of Fig. 20(b) is 
generated from the image of Fig. 20(a), and the evaluation vector is again 
calculated from the image of Fig. 20(b). 
[0 14 9] 

Thereby, the evaluation vector of the template image that has been 
bilaterally reversed can be generated without direct calculations, and 
processing speed can be improved. Additionally, recording capacity can 
be saved because the need to purposely store the template image that has 
been bilaterally reversed is obviated. 

[0 15 0] 
(Embodiment 7) 
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In this embodiment, eye/eyebrow extraction processing is added to 
the face extraction processing described in Embodiment 3. 
[0 15 1] 

As in Fig. 22(b), eyes/eyebrow candidate range can be roughly 
extracted from the input image of Fig. 22(a) according to the processing 
described in Embodiment 3. 
[0152] 

Point biserial correlation coefficient filters shown in Fig. 23(a) 
through Fig. 23(d) are applied onto each point of the image of this 
eye/eyebrow candidate range, a map of point biserial correlation values is 
then formed, and points where the correlation value in the map become 
high are set as the eyes center point 3002 and as an eyebrow center point 
3003, respectively. 
[0153] 

The point biserial correlation coefficient n is defined by the 
following formula (reference; Multivariate Analysis Handbook, page 17, 
Modern Mathematics). 
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[Formula 24] 

Overall range: addition of 1st range to 2nd range 

Number of pixels of 1 st range 

n: Number of pixels of overall range 

Average brightness level of 1st range 

Average brightness level of 2nd range 

S: Standard deviation of overall range 

Value in -f is constant when mask size is fixed. 

Referring to Fig. 23(a) through Fig.23(d), Fig. 23(a) shows the 
positional relationship of all ranges, Fig. 23(b) shows an overall range 
mask, Fig. 23(c) shows a 1st range mask, and Fig. 23(d) shows a 2nd 
range mask. 

[0 15 4] 

If the filter shape according to this point biserial correlation 
coefficient is formed as shown in Fig. 23(a), it will be expected that the 
eyebrow center 3002 and the eye center 3003 can be extracted as shown in 
Fig. 22(c). 
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[0 15 5] 

Next, filter processing according to the point biserial correlation 
coefficient will be described. 
[0156] 

First, main components of Formula 24 are expressed by the 
following formula. 

[Formula 25] 
wherein, 

Ml: 1st range mask 

M2: 2nd range mask 

Ma: Overall range mask 

I(x,y): Each pixel value of input image 

The following formulas are obtained by subjecting each component to 
Fourier transformation. 
[Formula 26] 

[Formula 27] 
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[Formula 28] 
wherein 

In order to perform these processings, the structure of Fig. 21 is 
formed, for example. First, each mask of the overall range, the 1st range, 
and the 2nd range is subjected to Fourier transformation by the orthogonal 
transformation unit 51 through 53. 

[0 15 7] 

An input image and a result of the face extraction described in 
Embodiment 3 are input to the eye/eyebrow candidate range extraction 
unit 54 through the map processing unit 12. Based on these inputs, the 
eye/eyebrow candidate range extraction unit 54 extracts only the 
eye/eyebrow candidate range shown in Fig. 22(b) from Fig. 22(a). Data 
concerning this eye/eyebrow candidate range is subjected to Fourier 
transformation by an orthogonal transformation unit 55 without any 
changes, is then squared by a square unit 56, and is subjected to Fourier 
transformation by an Orthogonal transformation unit 57. 
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[0158] 

Thereafter, data shown in Formula 27 is input through a 
multiplication unit 58 to an inverse orthogonal transformation unit 62 
precedent to a r\ map forming unit 65. Likewise, data shown in Formula 
26 is input to an inverse orthogonal transformation unit 63 through a 
multiplication unit 60, and data shown in Formula 28 is input to an inverse 
orthogonal transformation unit 64 through a multiplication unit 61. The 
input data is subjected to inverse Fourier transformation by the inverse 
orthogonal transformation unit 62 through 64, and is output to the t\ map 
forming unit 65. 
[0 15 9] 

Thereafter, the t\ map forming unit 65 performs a calculation 
according to Formula 24 when receiving the data from the inverse 
orthogonal transformation unit 62 through 64, and outputs a map n(x,y) of 
the point biserial correlation coefficient. 
[0 16 0] 

The eye/eyebrow center extraction unit 66 extracts two points 
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having high values from the map T| (x,y) output by the T| map forming unit 
65, and outputs them as the eyes center and an eyebrow center, 
respectively. 

[0161] 

In this structure, multiplication must be performed roughly 
15*15xN=225N times, wherein "15x15 "(pixels) is the filter size of Fig. 
23(a), and N is the number of pixels of the input image. 
[0 16 2] 

In contrast, according to this embodiment, the number of product 
calculations is roughly N+[(2 y -4)N+4] X 5+12N+N+N=5N(2 y -l)+20 
times. For convenience, the calculation of "^T" is assumed as equal to 
one multiplication processing. 
[0 16 3] 

That is, (calculation amount of this embodiment) < (calculation 
amount of the conventional technique) under the condition that y is 22 
or less. 



[0 16 4] 

Usually, a huge image in which N is 2 22 or more is not used. 
Therefore, it is understood that the image processing of this embodiment is 
smaller in the number of calculation times than that of the conventional 
processing, and is performed faster than that of the conventional 
processing. 

[0165] 

Thus, according to this embodiment, the point biserial correlation 
coefficient filter processing can be more efficiently performed. 
[0166] 

The positions of other face parts, such as eye corners, mouth edges, 
nostrils, and irises, can be calculated by variously changing the masks 
shown in Fig. 23. 

[0167] 
(Embodiment 8) 

This embodiment discloses a technique for expanding the function 
of the face extraction according to Embodiment 3 and extracting a mouth 



range, which is a facial organ, from a face image. 
[0168] 

As described in Embodiment 3 with reference to Fig. 14, a mouth 
candidate range shown in Fig. 25(b) can be extracted from the input image 
of Fig. 25(a). 

[0169] 

When the extracted mouth candidate range is projected onto the 
Y-axis (i.e., when the total h of pixel values is plotted along the X-axis), a 
graph roughly shown in Fig. 25(c) is obtained. 
[0170] 

The total h is defined by the following formula. 
[Formula 29] 

I(x,y): Each pixel value of input image 
w: Width of mouth candidate range 
h(x,y): Projection value 

In order to efficiently obtain this total by use of orthogonal 
transformation, a mask shown in Fig. 25(d) is prepared. Formula 29 can 



75 



be rewritten like the following formula, including this mask. 

[Formula 30] 
Mask 

When this is subjected to Fourier transformation, the following formula is 
obtained. 
[Formula 31] 

That is, what is required to obtain this projection value is to subject 
the input image and the mask pattern to Fourier transformation, thereafter 
perform the calculation of Formula 31, and subject its result to inverse 
Fourier transformation. Thereby, the map "h(x ? y) M of the projection 
value "h" can be obtained. 
[0171] 

From the above result, it is desirable to form the structure of Fig. 24, 
for example. As shown in Fig. 24, the input image and the mask pattern 
are subjected to Fourier transformation by the orthogonal transformation 
unit 25 and 26, respectively. 

[0 17 2] 
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Thereafter, the transformation value is multiplied by the 
multiplication unit 27, and is subjected to inverse Fourier transformation 
by the inverse orthogonal transformation unit 28. As a result, the map 
h(x,y) of the projection value is obtained. 
[01 7 3] 

The obtained map h(x,y) is developed by a projection data 
extraction unit 29, and a maximum point extraction unit 30 calculates two 
maximum points from this map, and outputs the range between these 
maximum points as mouth range. 
[0 17 4] 

The mask pattern can be, of course, beforehand subjected to 
Fourier transformation as in Embodiment 1, and, if so, a processing 
performance can be concentrated on the processing of the input image, and 
processing speed will be improved. 
[0 17 5] 

According to this embodiment, the position of the mouth, which is 
a facial organ, (more specifically, the positions of both the upper and lower 
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lips) can be accurately calculated. 
[0176] 

(Embodiment 9) 

This embodiment discloses a technique for expanding the function 
of the face extraction according to Embodiment 3, and applying 
predetermined processing to a face image according to first and second 
examples of this embodiment. 
[0 17 7] 

TFirst Example J 

In this example, a digital watermark is embedded into an extracted 
face image. That is, the structure of Fig. 26 is formed. A result using 
part 600 is provided at the subsequent stage of the map processing unit 12 
of Fig. 11. 

[0178] 

The result-using part 600 includes a face image cutting-out unit 31 
for separating an input image into a face image and a non-face image with 
reference to the input image and the face position that is determined by the 
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map processing unit 12. The face image part is output from the face 
image cutting-out unit 31 to a digital watermark embedding unit 32, and a 
predetermined digital watermark is embedded into the face image part. 
Herein, this digital watermark can be a well-known one. 
[0179] 

On the other hand, the part excluding the face image is output from 
the face image cutting-out unit 31 directly to an image synthesizing unit 
33. 

[01 8 0] 

The image synthesizing unit 33 combines the face image into 
which the digital watermark is embedded with the part excluding the face 
image, and outputs an image into which the digital watermark has been 
embedded. 

[0181] 

Therefore, with respect to, for example, an input image of Fig. 
27(a), an output image of Fig. 27(b) into a face part of which a digital 
watermark has been embedded is obtained. 



[0182] 

The name of a model to be photographed or the date of the 
photograph is suitable as a digital watermark. 
[0 18 3] 

According to this example, watermark data can be concentratively 
and easily embedded into a face part that is liable to be falsified. 
[0 18 4] 

r Second Example J 

In this example, specific editing is applied only to an extracted face 
image. That is, the structure of Fig. 28 is formed. A result-using part 
601 is provided at the subsequent stage of the map processing unit 12 of 
Fig. 11. 

[0185] 

The result-using part 601 includes the face image cutting-out unit 
31 for separating an input image into a face image and a non-face image 
with reference to the input image and the face position that is determined 
by the map processing unit 12. The face image part is output from the 



80 



face image cutting-out unit 31 to an image correction unit 34, and 
predetermined editing is applied to the face image part. Herein, the 
editing can be a well-known one. 
[0186] 

On the other hand, the part excluding the face image is output from 
the face image cutting-out unit 31 directly to the image synthesizing unit 

33. 

[0 18 7] 

The image synthesizing unit 33 combines the face image that has 
been edited with the part excluding the face image, and outputs an edited 
image. 

[0188] 

Therefore, referring to, for example, an input image of Fig. 29(a), 
an output image in which a face range has been extracted like Fig. 29(b), 
and only a face part has been edited like Fig. 29(c) is obtained. 
[0 18 9] 

In an image of a person photographed by counter light, an image of 
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the face part might be corrected so that a face color becomes whitish, 
because the face is too dark and is hard to see. However, in the editing 
by the image correction unit 34, such an image may be arbitrarily 
corrected. 

[0190] 

According to this example, an image of only the face part can be 
easily and selectively corrected, or, in other words, image correction can 
be performed without exerting an influence on a part excluding the face. 
[0191] 
[EFFECT OF THE INVENTION] 

According to the present invention, patterns can be extracted stably 
and at high speed with less influence of the difference of the photographic 
condition and background. 
[0192] 

Patterns of a plurality of templates can also be extracted at high 

speed. 

[0193] 
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The similarity value map can be amplified and a part similar to the 
template can be stably detected from the input image. 
[0 19 4] 

Attributes inside the image can be added to the similarity value, 
and can be processed more clearly and more efficiently. 
[0195] 

Data of the template image that have been bilaterally reversed can 
be generated indirectly, and data amount of the template image can be 
reduced. 

[0196] 

An object to be processed can be reduced rationally. 
[0197] 

Point biserial correlation coefficient filters can extract the center 
point of eyes and eyebrows. 
[0198] 

A mouth range, which is a facial organ can be specified accurately. 
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[0 19 9] 

Watermark data can be concentratively embedded into a face part 
that is liable to the falsified. 
[0 2 0 0] 

An image only the face part can be selectively performed the image 
correction. 

PESCRJPTION OF DRAWINGS] 
[Fig. 1] 

Fig. 1 is a block diagram of an image processing apparatus 
according to Embodiment 1 of the present invention. 
[Fig. 2] 

Fig. 2 (a) is a view showing a template image. 

Fig. 2 (b) is a view showing an edge extraction image (x 
component) of the template image. 

Fig. 2 (c) is a view showing an edge extraction image (y 
component) of the template image. 

[Fig. 3] 
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Fig. 3 is an explanatory drawing for explaining the compression 
processing of an evaluation vector. 
[Fig. 4] 

Fig. 4 (a) is a view showing an input image. 

Fig. 4 (b) is a view showing an edge extraction image (x 
component) of the input image. 

Fig. 4 (c) is a view showing an edge extraction image (y 
component) of the input image. 

[Fig. 5] 

Fig. 5 (a) is a view showing a map of similarity values. 
Fig. 5 (b) is a view showing the input image. 
[Fig. 6] 

Fig. 6 is an explanatory drawing for explaining a positive/negative 
reversal of an inner product. 
[Fig. 7] 

Fig. 7 (a) is a view showing a template image. 
Fig. 7 (b) is a view showing an input image. 
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[Fig. 8] 

Figs. 8 (a) is a view showing a template image. 
Figs. 8 (b) is a view showing a template image. 
[Fig. 9] 

Fig. 9 is a block diagram of an image processing apparatus 
according to Embodiment 2 of the present invention. 
[Fig. 10] 

Figs. 10 (a) is a graph showing conjugate properties. 
Figs. 10 (b) is a graph showing conjugate properties. 
[Fig. 11] 

Fig. 1 1 (a) is a block diagram of a template image processing part 
according to Embodiment 3 (first example) of the present invention. 

Fig. 11 (b) is a block diagram of a template image processing part 
according to Embodiment 3 (second example) of the present invention. 

[Fig. 12] 

Fig. 12 is a view showing an input image. 
[Fig. 13] 
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Fig. 13 (a) is a view showing a template image. 

Fig. 13 (b) is a view showing an edge extraction image. 

Fig. 13 (c) is a view showing an enlargement/reduction template 

image. 

Fig. 13 (d) is a view showing a template image. 

Fig. 13 (e) is a view showing an edge extraction image. 

Fig. 13 (f) is a view showing an enlargement/reduction template 

image. 

[Fig. 14] 

Fig. 14 is a view showing a face extraction result. 
[Fig. 15] 

Fig. 15 is a block diagram of an image processing apparatus 
according to Embodiment 4 of the present invention. 
[Fig. 16] 

Fig. 16 is a view showing a peak pattern. 
[Fig. 17] 

Fig. 17 is a block diagram of an image processing apparatus 
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according to Embodiment 5 of the present invention. 
[Fig. 18] 

Fig. 1 8 (a) is a view showing a template image. 
Fig. 18 (b) is a view showing a mask pattern. 
[Fig. 19] 

Fig. 19 is a block diagram of an image processing apparatus 
according to Embodiment 6 of the present invention. 
[Fig. 20] 

Fig. 20 (a) is a view showing an original template image. 
Fig. 20 (b) is a view showing a template image that has been 
subjected to a bilateral reversal. 
[Fig. 21] 

Fig. 21 is a block diagram of a part of the image processing 
apparatus according to Embodiment 6 of the present invention. 
[Fig. 22] 

Fig. 22 (a) is a view showing an input image. 

Fig. 22 (b) is a view showing the eyes/eyebrow candidate range. 
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Fig. 22 (c) is a view showing a recognition result. 
[Fig. 23] 

Fig. 23 (a) is a view showing a filter shape according to 
Embodiment 7 of the present invention. 

Fig. 23 (b) is an explanatory drawing of an overall range mask. 
Fig. 23 (c) is an explanatory drawing of a range- 1 mask. 
Fig. 23 (d) is an explanatory drawing of a range-2 mask. 
[Fig. 24] 

Fig. 24 is a block diagram of a part of an image processing 
apparatus according to Embodiment 8 of the present invention. 
[Fig. 25] 

Fig. 25 (a) is a view showing an input image. 

Fig. 25 (b) is an explanatory drawing of a mouth candidate range. 

Fig. 25 (c) is a graph of projection values. 

Fig. 25 (d) is an explanatory drawing of a mask pattern. 

Fig. 25 (e) is a view showing a projection- value map image. 

Fig. 25 (f) is a graph of projection values. 
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[Fig. 26] 

Fig. 26 is a block diagram of a part of an image processing 
apparatus according to Embodiment 9 (first example) of the present 
invention. 

[Fig. 27] 

Fig. 27 (a) is a view showing an input image. 
Fig. 27 (b) is a view showing an output image. 
[Fig. 28] 

Fig. 28 is a block diagram of a part of the image processing 
apparatus according to Embodiment 9 (second example) of the present 
invention. 

[Fig. 29] 

Figs. 29 (a) is a view showing an input image. 
Figs. 29 (b) is a view showing an output image. 
[DESCRIPTION OF SYMBOLS] 

1 edge extraction unit 

2 evaluation vector generation unit 
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3 orthogonal transformation unit 

4 compression unit 

5 recording unit 

6 edge extraction unit 

7 evaluation vector generation unit 

8 orthogonal transformation unit 

9 compression unit 

10 multiplication unit 

1 1 inverse orthogonal transformation unit 

12 map processing unit 

13 conjugate compression unit 

14 conjugate reconstruction unit 

1 5 enlargement/reduction unit 

16 addition unit 

17 orthogonal transformation unit 

18 compression unit 

19 recording unit 
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20 


mask pattern generation unit 


21 


orthogonal transformation unit 


22 


compression unit 


23 


recording unit 


24 


symmetric vector generation unit 


25,26 


orthogonal transformation unit 


27 


multiplication unit 


28 


inverse orthogonal transformation unit 


29 


projection data extraction unit 


30 


maximum point extraction unit 


31 


face image cutting-out unit 


32 


digital watermark embedding unit 


33 


image synthesizing unit 


601 


result-using part 


34 


image correction unit 


51 


orthogonal transformation unit 


52 


orthogonal transformation unit 
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53 orthogonal transformation unit 

54 eye/eyebrow candidate range extraction unit 

55 orthogonal transformation unit 

56 square unit 

57 orthogonal transformation unit 

58 multiplication unit 

59 reduction unit 

60 multiplication unit 

6 1 multiplication unit 

62,63,64 inverse orthogonal transformation unit 

65 t| map forming unit 

66 eye/eyebrow center extraction unit 
100 template image processing part 

1 0 1 9 1 02 template image processing part 

200 input image processing part 

300 peak pattern processing part 

400 mask pattern processing part 
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500 projection image processing part 
600 result-using part 
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[DOCUMENT NAME] ABSTRACT 
[ABSTRACT] 

[PROBLEM TO BE SOLVED] It is an object of the present invention 
to provide an image processing method and an image processing apparatus 
capable of obtaining an accurate, clear recognition result and capable of 
performing high speed processing. 

[SOLUTION] An image processing method for detecting an object 
from an input image by use of a template image. The image processing 
method includes a step of inputting a specified image with respect to both 
a template image and an input image, a step of calculating an edge normal 
direction vector of the specified image, a step of generating an evaluation 
vector from the edge normal direction vector, a step of subjecting the 
evaluation vector to orthogonal transformation, a step of performing a 
product sum calculation of corresponding spectral data with respect to 
each evaluation vector that has been subjected to orthogonal 
transformation and has been obtained for each of the template image and 
the input image, and a step of subjecting it to inverse orthogonal 
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transformation and generating a similarity value map. The formula of the 
similarity value, the orthogonal transformation, and the inverse orthogonal 
transformation each have linearity. It is possible to make pattern 
recognition in which the component of the similarity value is not subjected 
to positive/negative reversal through variations in brightness of its 
background. 
[SELECTED FIGURE] Fig. 1 
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Fig.l 

template images 1 edge extraction 
2,7 evaluation vector generation 
4,9 compression 
5 recording 

input image— >6 edge extraction 

1 1 inverse FFT 

12 map processing— >result 
Fig.2(a) 

template image 

Fig.2(b) 

X component 

Fig.2(c) 

Y component 

Fig.3 

object to be processed 
Fig.4(a) 
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input image 

Fig.4(b) 

X component 

Fig.4(c) 

Y component 

Fig.5(a) 

maximum value 

Fig.7(a)(b) 

backgroimd 

/\ 

\y 

specified object 

template image input image 
Fig.9 

template image-* 1 edge extraction 
2,7 evaluation vector generation 
4,9 compression 



98 



5 recording 

input image^6 edge extraction 

1 1 inverse FFT 

1 2 map processing— ^result 

13 conjugate compression 

1 4 conj ugate restoring 
Fig. 11 (a) 

1 edge extraction 

2 evaluation vector generation 

4 compression 

5 recording 

template images 1 5 enlargement/reduction 
Fig. 11(b) 

1 edge extraction 

2 evaluation vector generation 

4 compression 

5 recording 



99 



template image—* 1 5 enlargement/reduction 
Fig. 15 

template imaged 1 edge extraction 
2,7 evaluation vector generation 
4,9,18 compression 
5,19 recording 

input image-^6 edge extraction 

1 1 inverse FFT 

1 2 map processing^result 
peak pattern 1 7 FFT 
Fig. 17 

template image— >1 edge extraction 
2,7 evaluation vector generation 
4,9,22 compression 
5,23 recording 

input image— > 6 edge extraction 
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11 inverse FFT 

1 2 map processing— >result 
20 mask pattern generation 
Fig. 18(a) 

template image 
Fig. 18(b) 
mask pattern q 
value 0 part 
value 1/N part 
Fig. 19 

template image— >1 edge extraction 
2,7 evaluation vector generation 
4,9 compression 
5 recording 

input images 6 edge extraction 

1 1 inverse FFT 

12 map processing— >result 



24 symmetric vector generation 
Fig.21 

overall range mask(Ma) 
1st range mask(Ml) 
2nd range mask(M2) 
12 map processing 

input image— >54 eye/eyebrow candidate range extraction 

56 square 

62,63,64 inverse FFT 

65 r| map forming 

66 eye/eyebrow center extraction^center position 
Fig.23(a) 

don't care 
1st range 
2nd range 
1 st range 
Fig.23(b) 
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overall mask(Ma) 
Fig.23(c) 

1st range mask(Ml) 
Fig.23(d) 

2nd range mask(M2) 
Fig.24 

input image-^25 FFT 
mask pattern -^26 FFT 

28 inverse FFT 

29 projection data extraction 

30 maximum point extraction^mouth range output 
Fig.25(c) 

maximum point 
mouth range 
projection value 
Fig.25(d) 
value 0 
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value 1 
Fig.25(e) 

mouth candidate position 
Fig.25(f) 
maximum point 
mouth range 
projection value 
Fig.26 

12 map processing^result output face position 
input images 3 1 face image cutting-out 
face image-^32 digital watermark embedding 
non-face image— >33 image synthesizing 
Fig.27(b) 

digital watermark embedded part 
Fig.28 

12 map processing— >result output face position 
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input image— >31 face image cutting-out 
face image— >34 image correction 
non-face image— >33 image synthesizing 
Fig.29(b) 

detected face range 
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[DOCUMENT NAME] SPECIFICATION 

[TITLE OF THE INVENTION] IMAGE PROCESSING METHOD 

AND APPARATUS 

[CLAIMS] 

[Claim 1] An image processing method for evaluating matching 
between a template image and an input image by use of a similarity value 
map, 

the method including a step of generating an evaluation vector for 
each of the template image and the input image, wherein the evaluation 
vector includes a component in which an edge normal direction vector of a 
specified image undergoes even-numbered times angular transformation. 
[Claim2] An image processing method comprising: 

a step of inputting a specified image for both a template image 
and an input image and calculating an edge normal direction vector of the 
specified image; 

a step of generating an evaluation vector from the edge normal 
direction vector; 
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a step of subjecting the evaluation vector to orthogonal 
transformation; 

a step of performing a product sum calculation of corresponding 
spectral data for each evaluation vector that has been subjected to 
orthogonal transformation and has been obtained for the template image 
and input image; and 

a step of subjecting a result of the product sum calculation to 
inverse orthogonal transformation and generating a map of similarity 
values; 

wherein a formula of the said similarity values, the said 
orthogonal transformation, and the said inverse orthogonal transformation 
each have linearity. 

[Claim3] The image processing method of Claim 2, further comprising 
a step of compressing each evaluation vector that has been subjected to 
orthogonal transformation so as to reduce a processing amount. 

[Claim4] The image processing method of Claim 2 to 3, wherein for 
the template image, the steps taken until the evaluation vector that has 
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been subjected to orthogonal transformation is compressed are executed 
before the input image is input, and its result is stored in a recording 
means. 

[Claim5] The image processing method of Claim 2 to 4, wherein the 
evaluation vector is normalized with respect to a vector length. 

[Claim6] The image processing method of Claim 2 to 5, wherein the 
evaluation vector of the template image is normalized by the number of 
edge normal direction vectors. 

[Claim7] The image processing method of Claim 2 to 6, wherein a data 
amount is reduced by use of complex conjugate properties of orthogonal 
transformation before performing a product sum calculation, and the data 
amount is restored after performing the product sum calculation. 

[Claim8] The image processing method of Claim 2 to 7, wherein the 
template image is enlarged/reduced to various sizes, and the evaluation 
vector of each size is subjected to addition processing. 

[Claim9] The image processing method of Claim 8, wherein for the 
template image, the addition processing of the evaluation vector is carried 



out after executing the step of compressing each evaluation vector so as to 
reduce the processing amount. 

[ClaimlO] The image processing method of Claim 2 to 9, wherein the 
template image is an image of a typified face. 

[Claimll] The image processing method of Claim 2 to 10, wherein a 
peak pattern that makes a peak of the similarity value steep is prepared, 
and a result obtained by subjecting data of this peak pattern to orthogonal 
transformation is applied to the product sum calculation. 

[Claim 12] The image processing method of Claim 2 to 10, wherein a 
mask pattern that depends on the template image is formed, and a result 
obtained by subjecting data of this mask pattern to orthogonal 
transformation is applied to the product sum calculation. 

[Claiml3] The image processing method of Claim 12, wherein the said 
mask pattern shows an average of a number of pixels in an image of the 
template image. 

[Claim 14] The image processing method of Claim 2 to 12, further 
comprising a step of, for the template image, processing positive and 
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negative signs of the evaluation vector of the original template image and 
generating an evaluation vector of a bilaterally symmetrical image with 
respect to the original template image, by which the generated evaluation 
vector is applied to the product sum calculation. 

[Claim 15] The image processing method of Claim 10, wherein a map 
of point biserial correlation coefficients is generated on the basis of an 
extracted face image, and a position of the face part is calculated. 

[Claim 16] The image processing method of Claim 10, wherein a 
distribution of projection values in a y-direction is calculated on the basis 
of the extracted face image by use of the mask pattern, and two maximum 
points are calculated from this distribution, and an range between these 
maximum points is output as a mouth range. 

[Claim 17] The image processing method of Claim 10, wherein the 
input image is divided into only the face image and parts other than the 
face image on the basis of the extracted face image, and a digital 
watermark is embedded only into the face image, and the face image into 
which the digital watermark has been embedded and parts other than the 
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face image are combined together and are output. 

[Claim 18] The image processing method of Claim 10, wherein the 
input image is divided into only the face image and parts other than the 
face image on the basis of the extracted face image, and only the face 
image is edited, and the face image that has been edited and parts other 
than the face image are combined together and are output. 

[Claiml 9] An image processing apparatus comprising: 

a template image processing part for inputting a template image 
and calculating an edge normal direction vector of the template image, 
generating an evaluation vector from the edge normal direction vector, 
subjecting the evaluation vector to orthogonal transformation, and 
compressing the evaluation vector that has been subjected to the 
orthogonal transformation so as to reduce the processing amount; 

an input image processing part for inputting an input image and 
calculating an edge normal direction vector of the input image, generating 
an evaluation vector from the edge normal direction vector, subjecting 
the evaluation vector to orthogonal transformation, and compressing the 



evaluation vector that has been subjected to the orthogonal transformation 
so as to reduce the processing amount; 

multiplication means for performing a product sum calculation of 
corresponding spectral data about each evaluation vector that has been 
subjected to the orthogonal transformation and has been obtained for the 
template image and the input image; and 

inverse orthogonal transformation means for subjecting a result of 
the product sum calculation to inverse orthogonal transformation and 
generating a map of similarity values; 

wherein the evaluation vector includes a component in which an 
edge normal direction vector of a specified image undergoes 
even-numbered times angular transformation, and a formula of the 
similarity values, the orthogonal transformation, and the inverse 
orthogonal transformation each have linearity 

[Claim20] The image processing apparatus of Claim 19, wherein the 
said template image processing part is provided with a recording means 
for recording the evaluation vector that has been compressed to reduce a 
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processing amount and that has been subjected to orthogonal 
transformation, and 

a result obtained by compressing the evaluation vector that has 
been subjected to orthogonal transformation is stored in the recording 
means before inputting the input image. 

[Claim21] The image processing apparatus of Claim 19, further 
comprising: a conjugate compression means, provided between the said 
recording means and the said multiplication means, for reducing the data 
amount by use of complex conjugate properties of orthogonal 
transformation; and 

a conjugate restoring means, provided between the said 
multiplication means and the said inverse orthogonal transformation 
means, for restoring the data amount reduced by use of the complex 
conjugate properties of orthogonal transformation. 

[Claim22] The image processing apparatus of Claim 19 to 21, further 
comprising: an enlargement/reduction means for enlarging/reducing the 
template image to various sizes; and an addition means for performing 



addition processing of the evaluation vector of each size. 

[Claim23] The image processing apparatus of Claim 22, wherein the 
said addition means performs addition processing of the evaluation vector 
of the template image after compressing the vector so as to reduce the 
processing amount. 

[Claim24] The image processing apparatus of Claim 19, further 
comprising a peak pattern processing part for subjecting a peak pattern by 
which a peak of a similarity value is made steep to orthogonal 
transformation and compressing the peak pattern that has been subjected 
to the orthogonal transformation so as to reduce the processing amount, 
wherein a result obtained by subjecting data of this peak pattern to the 
orthogonal transformation is applied to a product sum calculation of the 
said multiplication means. 

[Claim25] The image processing apparatus of Claim 19, further 
comprising a mask pattern processing part for forming a mask pattern that 
depends on the template image and generating data obtained by subjecting 
data of this mask pattern to orthogonal transformation and by compressing 
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it, 

wherein a processing result of the said mask pattern processing part 
is applied to a product sum calculation of the said multiplication means . 

[Claim26] The image processing apparatus of Claim 25, wherein the 
said mask pattern shows a mean of the number of pixels inside an image 
of the template image. 

[Claim27] The image processing apparatus of Claim 20, further 
comprising a symmetric vector generation means for processing positive 
and negative signs of the evaluation vector of an original template image 
recorded in the said recording means, and generating an evaluation vector 
of a bilaterally symmetric image with respect to the original template 
image, 

wherein the evaluation vector generated by the symmetric vector 
generation means is applied to a product sum calculation of the said 
multiplication means. 

[Claim28] The image processing apparatus of Claim 19, further 
comprising a T|map forming means for forming a map of a point biserial 
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correlation coefficient on the basis of an extracted face image, and an 
extraction means for calculating a position of a face part from the formed 
map. 

[Claim29] The image processing apparatus of Claim 19, further 
comprising a maximum point extraction means for calculating a projection 
value distribution in a y direction by use of a mask pattern on the basis of 
an extracted face image, and calculating two maximum points from this 
distribution, and outputting a range between the maximum points such as a 
mouth range. 

[Claim30] The image processing apparatus of Claim 19, further 
comprising: a face image cutting-out means for separating an input image 
into only a face image and parts excluding the face image on the basis of 
an extracted face image; a digital watermark embedding means for 
embedding a digital watermark only into the face image; and an image 
synthesizing means for combining the face image into which the digital 
watermark has been embedded with parts excluding the face image and 
outputting them. 
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[Claim31] The image processing apparatus of Claim 19, comprising a 
face image cutting-out means for separating an input image into only a 
face image and parts excluding the face image on the basis of an extracted 
face image; an image edge correction means for editing only the face 
image; and an image synthesizing means for combining an edited face 
image with parts excluding the face image and outputting them. 

[Claim32] The image processing method of Claim 10, further 
comprising: a step of cutting out a face image from the input image on the 
basis of an extracted face image; a step of extracting a facial inner image 
from the face image that has been cut out; a step of calculating a feature 
that serves to correct the face image on the basis of the extracted face 
image; a step of determining a correction function on the basis of the 
obtained feature; and a step of applying image correction based on the 
determined correction function at least onto the face image that has been 
cut out. 

[Claim33] The image processing method of Claim 32, wherein the said 
feature is a combination of two or more of brightness, chroma average, or 
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hue average. 

[Claim34] The image processing apparatus of Claim 19, further 
comprising: face image cutting-out means for cutting out a face image 
from the input image on the basis of an extracted face image; face internal 
range extraction means for extracting a facial inner image from the face 
image that has been cut out; image feature extraction means for calculating 
a feature that serves to correct the face image on the basis of the extracted 
face image; correction function determining means for determining a 
correction function on the basis of the obtained feature; and image 
correction means for applying image correction based on the determined 
correction function at least onto the face image that has been cut out. 

[Claim35] The image processing apparatus of Claim 34, wherein the 
said feature is a combination of two or more of brightness, chroma average, 
or hue average. 

[DETAILED DESCRIPTION] 
[0 0 0 1] 
[TECHNICAL FIELD] 



The present invention relates to an image processing method for 
detecting an object from an input image by use of a template image, and 
relates to an image processing apparatus therefor. 
[0 0 0 2] 
[PRIOR ART] 

Conventionally, a technique is well known in which a template 
image is pre-registered, and the position in an input image of an image 
similar to the template image is detected by pattern matching between the 
input image and the template image. 

[0 0 0 3] 

However, since distorted perceptions are liable to be shaped 
according to how the background of the image similar to the template 
image is formed, Japanese Published Unexamined Patent Application No. 
Hei-5-28273 discloses a technique that has been developed to solve this 
problem. 

[0 0 0 4] 

In this publication, a similarity value between the template image 
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and the image corresponding to the template image is defined by the 
following mathematical formula. 
[Formula 1 ] 

Cv: Correlation coefficient (similarity value) 
M: Number of pixels of template image in x direction 
N: Number of pixels of template image in y direction 
Sx: Derivative value of input image S in x direction 
Sy: Derivative value of input image S in y direction 
Tx: Derivative value of template image T in x direction 
Ty: Derivative value of template image T in y direction 
[0 0 0 5] 

In detail, an inner product (cosG) of an angle 9 between a normal 
direction vector of the edge of the template image and a normal direction 
vector of the edge of the input image is a component of the similarity 
value. 

[0 0 0 6] 

[TECHNICAL PROBLEM] 
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However, there is a problem in that, as described later in detail, if 
the brightness of a background periphery of an image of an object is 
uneven, the positive and negative of the inner product are reversed, so that 
the similarity value becomes unsuitable for the real image, and distorted 
perceptions are easily produced, thus making it difficult to obtain a 
desirable recognition result. 

[0 0 0 7] 

Additionally, the similarity value formula is nonlinear with respect 
to the normal direction vectors of the edges of the input and template 
images, and processing for the template image and processing for the input 
image must be performed simultaneously. 

[0 0 0 8] 

Further, the template image is scanned on the input image, and a 
correlated calculation of the input image and the reference image must be 
performed for each scanning point, and therefore, in practicality, a 
resulting expanding calculation amount makes real-time processing 
impossible. 



[0 0 0 9] 

It is therefore an object of the present invention to provide an 
image processing method and an image processing apparatus capable of 
obtaining an accurate, clear recognition result and capable of performing 
high speed processing. 
[0 0 10] 
[MEANS] 

In an image processing method according to a first aspect of the 
present invention, matching between a template image and an input image 
is evaluated by use of a similarity value map, and an evaluation vector is 
generated for each of the template and input images, and the evaluation 
vector includes a component in which a normal direction vector of an edge 
of a specified image undergoes even-numbered times angular 
transformation. 

[0011] 

With this structure, an image processing method capable of 
obtaining a clear, accurate recognition result and capable of performing 



high speed processing can be realized. 
[0 0 12] 

[EMBODIMENTS OF THE INVENTION] 

In an image processing method according to a first aspect of the 
present invention, matching between a template image and an input image 
is evaluated by use of a similarity value map, and an evaluation vector is 
generated for each of the template and input images, and the evaluation 
vector includes a component in which a normal direction vector of an edge 
of a specified image undergoes even-numbered times angular 
transformation. 

[0 0 13] 

With this structure, the matching therebetween can be properly 
evaluated with no influence on the similarity value even in a case in which 
the positive and negative of an inner product (cos0) of an angle 6 between 
a normal direction vector of the edge of the template image and a normal 
direction vector of the edge of the input image are reversed because of 
unevenness in brightness of its background. 



[0 0 14] 

An image processing method according to a second aspect of the 
present invention includes a step of inputting a specified image for each of 
a template image and an input image and calculating a normal direction 
vector of an edge of the specified image; a step of generating an evaluation 
vector from the edge normal direction vector; a step of subjecting the 
evaluation vector to an orthogonal transformation; a step of performing a 
product sum calculation of corresponding spectral data for each evaluation 
vector, which has been subjected to orthogonal transformation, obtained 
for the template image and input image; and a step of subjecting a result of 
the product sum calculation to an inverse orthogonal transformation and 
generating a map of similarity values; in which a formula of the similarity 
values, the orthogonal transformation, and the inverse orthogonal 
transformation each have linearity. 
[0015] 

With this structure, a Fourier transformation value of the template 
image and a Fourier transformation value of the input image do not need 

22 



to be simultaneously calculated. In other words, the Fourier 
transformation value of the template image can be obtained prior to that of 
the input image, thus making it possible to lighten the processing burden 
and improve processing speed. 
[0016] 

An image processing method according to a third aspect of the 
present invention includes a step of compressing each evaluation vector, 
which has been subjected to the orthogonal transformation, so as to reduce 
the processing amount. 

[0 0 17] 

With this structure, what is processed is limited only to an effective 
component (e.g., low-frequency component), and processing speed can be 
improved. 

[0018] 

In an image processing method according to a fourth aspect of the 
present invention, for the template image, the steps taken until the 
evaluation vector that has been subjected to the orthogonal transformation 
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is compressed are executed before the input image is input, and its result is 
stored in a recording unit. 
[0 0 19] 

With this structure, the processing relating to the template image is 
completed merely by reading from the recording unit, and processing 
speed can be improved. 
[0 0 2 0] 

In an image processing method according to a fifth aspect of the 
present invention, the evaluation vector is normalized with respect to a 
vector length. 

[0 0 2 1 ] 

With this structure, the stability of pattern extraction can be 
improved without undergoing length variations though the strength of the 
edge of the input image varies, and the vector length varies according to a 
photographic condition. 

[0 0 2 2] 

In an image processing method according to a sixth aspect of the 
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present invention, the evaluation vector of the template image is 
normalized by the number of edge normal direction vectors. 
[0 0 2 3] 

Therefore, independent of whether the number of edges of the 
template image is large or small, a similarity can be evaluated on the same 
scale by dividing it by n and normalizing it. 

[0 0 2 4] 

In an image processing method according to a seventh aspect of the 
present invention, a data amount is reduced by use of complex conjugate 
properties of an orthogonal transformation before performing a product 
sum calculation, and the data amount is restored after performing the 
product sum calculation. 
[0 0 2 5] 

With this structure, the data amount can be greatly reduced to 
improve processing speed, and memory capacity can be saved. 
[0 0 2 6] 

In an image processing method according to an eighth aspect of the 
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present invention, the template image is enlarged/reduced to various sizes, 
and the evaluation vector of each size is subjected to addition processing. 
[0 0 2 7] 

With this structure, matching does not need to be repeatedly carried 
out for each size, and processing speed can be improved. 
[0 0 2 8] 

In an image processing method according to a ninth aspect of the 
present invention, for the template image, the addition processing of the 
evaluation vector is carried out after executing the step of compressing 
each evaluation vector so as to reduce the processing amount. 
[0 0 2 9] 

With this structure, what is subjected to addition processing can be 
reduced, and processing speed can be improved. 
[0 0 3 0] 

In an image processing method according to a tenth aspect of the 
present invention, the template image is an image of a typified face. 
[0 0 3 1 ] 
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With this structure, not only the total position of a face but also the 
partial position of a main face part, such as the eyes, nose, or mouth, can 
be recognized. 

[0 0 3 2] 

In an image processing method according to an 11th aspect of the 
present invention, a peak pattern that makes a peak of the similarity value 
steep is prepared, and a result obtained by subjecting data of this peak 
pattern to an orthogonal transformation is applied to the product sum 
calculation. 

[0 0 3 3] 

With this structure, a part similar to a template can be detected 
from the input image more clearly and stably while reflecting the peak 
pattern. 

[0 0 3 4] 

In an image processing method according to a 12th aspect of the 
present invention, a mask pattern that depends on the template image is 
formed, and a result obtained by subjecting data of this mask pattern to an 
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orthogonal transformation is applied to the product sum calculation. 
[0 0 3 5] 

With this structure, closer detection can be performed while adding 
attributes other than the shape of the template image. 
[0 0 3 6] 

In an image processing method according to a 13th aspect of the 
present invention, the mask pattern shows an average of a number of 
pixels in an image of the template image. 
[0 0 3 7] 

With this structure, attributes of the template image can be reflected 
by a simple mask pattern. 
[0 0 3 8] 

An image processing method according to a 14th aspect of the 
present invention further includes a step of, for the template image, 
processing positive and negative signs of the evaluation vector of the 
original template image and generating an evaluation vector of a 
bilaterally symmetrical image with respect to the original template image, 
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by which the generated evaluation vector is applied to the product sum 
calculation. 

[0 0 3 9] 

With this structure, the recording amount of the template image can 
be saved, and the evaluation vector of the template image that has been 
bilaterally reversed can be generated without direct calculation, thus 
making it possible to improve processing speed. 
[0 0 4 0] 

In an image processing method according to a 15th aspect of the 
present invention, a map of point biserial correlation coefficients is 
generated on the basis of an extracted face image, and a position of the 
face part is calculated. 

[0 0 4 1] 

With this structure, the position of the face part can be specified 
more accurately. 
[0 0 4 2] 

In an image processing method according to a 16th aspect of the 
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present invention, a distribution of projection values in a y-direction is 
calculated on the basis of the extracted face image by use of the mask 
pattern, and two maximum points are calculated from this distribution, and 
an extent between these maximum points is output as a mouth range. 
[0 0 4 3] 

With this structure, the mouth range can be specified more 
accurately. 

[0 0 4 4] 

In an image processing method according to a 17th aspect of the 
present invention, the input image is divided into only the face image and 
parts other than the face image on the basis of the extracted face image, 
and a digital watermark is embedded only into the face image, and the face 
image into which the digital watermark has been embedded and parts other 
than the face image are combined and output. 
[0 0 4 5] 

With this structure, watermark data can be concentratively 
embedded into the face part that is liable to be falsified. 



30 



II 



[0 04 6] 

In an image processing method according to an 18th aspect of the 
present invention, the input image is divided into only the face image and 
parts other than the face image on the basis of the extracted face image, 
and only the face image is edited, and the face image that has been edited 
and parts other than the face image are combined and output. 

[0 0 4 7] 

With this structure, only the face image can be corrected without 
exerting an influence on parts other than the face. 
[0 0 4 8] 

In an image processing method according to a 32nd aspect of the 
present invention, a face image is cut out from the input image on the basis 
of the extracted face image, a facial inner image is then extracted from the 
face image that has been cut out, a feature that serves to correct the face 
image is then calculated on the basis of the extracted face image, a 
correction function is then determined on the basis of the obtained feature, 
and image correction based on the determined correction function is 
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applied at least onto the face image that has been cut out. 
[0 0 4 9] 

With this structure, the face image is corrected according to the 
feature of only the facial inner image without being adversely affected by 
images that are not the face image, and therefore the viewability of the 
face image can be infallibly improved. 

[0 0 5 0] 
(Embodiment 1) 

Referring to Fig. 1 through Fig. 8, the basic form of an image 
processing apparatus of the present invention will be described. 
[0 0 5 1 ] 

As shown in Fig. 1, the image processing apparatus according to 
Embodiment 1 includes two different processing parts, i.e., a template 
image processing part 100 and an input image processing part 200, and 
evaluates matching between a template image and an input image by use 
of a map of similarity values L. In this image processing apparatus, both 
the template image processing part 100 and the input image processing 
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part 200 perform orthogonal transformation with linearity, then perform 
multiplication, and calculate the similarity values L by inverse orthogonal 
transformation. 

[0 0 5 2] 

It should be noted that FFT (Fast Discrete Fourier Transformation) 
is used as an orthogonal transformation in all embodiments described later. 
However, not only FFT but also Hartley transformation or number 
theoretic transformation can be used, and "Fourier transformation" in the 
following description can be replaced by Hartley transformation or 
number theoretic transformation. 
[0 0 5 3] 

Further, both the template image processing part 100 and the input 
image processing part 200 employ an inner product of edge normal 
direction vectors so that a correlation becomes proportionately higher with 
proximity between the directions of the edge normal direction vectors. 
This inner product is evaluated by use of an even-numbered times angle 
expression. For convenience, only a double size angle will be hereinafter 



described as an example of even-numbered times angle, but the same 
effect with the present invention can be produced in other even-numbered 
times angles such as a quadruple size or a sextuple size. 
[0 0 5 4] 

Next, the template image processing part 100 will be described. 
An edge extraction unit 1 applies differential processing (edge extraction) 
to a template image in x and y directions, and outputs an edge normal 
direction vector of the template image. 

[0 0 5 5] 

In this embodiment, a Sobel filter of [Formula 2] is used for the x 
direction, and a Sobel filter of [Formula 3] is used for the y direction. 
[0 0 5 6] 

An edge normal direction vector of the template image defined by 
the following formula is obtained with these filters. 
[Formula 4] 

[0 0 5 7] 
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This embodiment takes an example in which an image of a person 
in a specific posture who is walking through a pedestrian crossing is 
extracted from an input image of the vicinity of the pedestrian crossing. 
[0 0 5 8] 

The template image of the person is that of Fig. 2(a), for example. 
If the filter processing of Formula 2 is applied to the template image of Fig. 
2(a), a result (x component) as shown in Fig. 2(b) will be obtained, and, if 
the filter processing of Formula 3 is applied to the template image of Fig. 
2(a), a result (y component) as shown in Fig. 2(c) will be obtained. 
[0 0 5 9] 

An evaluation vector generation unit 2 inputs an edge normal 
direction vector of the template image from the edge extraction unit 1, and 
outputs the evaluation vector of the template image to an orthogonal 
transformation unit 3 through processings described later. 

[0 0 6 0] 

First, the evaluation vector generation unit 2 normalizes the length 
of the edge normal direction vector of the template image according to the 



35 



following formula. 
[Formula 5] 

[0 0 6 1] 

Generally, the edge strength of the input image varies according to 
a photographic condition. However, an angular difference between the 
edges of the input and template images (or, a subordinate function value 
that varies according to this angular difference) is unsusceptible to the 
photographic condition. 

[0 0 6 2] 

Therefore, in the present invention, the edge normal direction 
vector of the input image is normalized to be 1 in length in the input image 
processing part 200, as described later. According to this, the edge 
normal direction vector of the template image is normalized to be 1 in 
length in the template image processing part 100. 

[0 0 6 3] 

As a result, the stability of pattern extraction can be improved. 
Usually, it is desirable to have a normalization length of FlJ , but other 
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constants can be used. 
[0 0 6 4] 

Concerning a trigonometric function, the following double size 
formula is established as is well known. 
[Formula 6] 

[0 0 6 5] 

The evaluation vector generation unit 2 calculates the evaluation 
vector of the template image defined by the following formula. 
[Formula 7] 

When assumed to be a threshold (for fine edge removal), the evaluation 

vector V of the template image is: 

if 

else 

wherein n is the number of > a. 
[0 0 6 6] 

Formula 7 will be described. The reason why a vector smaller 
than constant "a" is set as a 0 vector is to remove a noise, for example. 
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[0 0 6 7] 

Next, a description will be given of the fact that x and y 
components of this evaluation vector are each divided by "n" for 
normalization. 

[0 0 6 8] 

Generally, the shape of the template image is arbitrary, and the 
shape of the edge is various. For example, there is a situation in which 
the number of edges is small as shown in Fig. 8(a), or the number of edges 
is larger (than that of Fig. 8(a)) as shown in Fig. 8(b). Therefore, in this 
embodiment, the similarity is intended to be evaluated on the same scale 
by dividing it by "n" for normalization independent of whether the number 
of the edges is small or large. 
[0 0 6 9] 

However, the normalizing processing of dividing it by "n" does not 
necessarily need to be performed, and normalizing processing can be 
omitted in a situation in which only one template image is used or in 
which only template images the same as the number of edges are used. 
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[0 0 7 0] 

Next, a description will be given of the fact that the x and y 
components of Formula 7 serve as a subordination function of the 
cosine/sine for a double size of the x and y components of Formula 5. 

[0 0 7 1 ] 

If the inner product cos0 of an angle 0 between an evaluation vector 
T of the template image and an evaluation vector I of the input image is 
used as a similarity scale as in the conventional technique, the following 
problem will arise. 

[0 0 7 2] 

For example, let us suppose that the template image is that of Fig. 
7(a), and the input image is that of Fig. 7(b). In the background of Fig. 
7(b), the left background part of an image of a specified object is brighter 
than the image of the specified object, and the right background part of the 
image of the specified object is darker than the image of the specified 
object. 

[0 0 7 3] 
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If considered only from the images, the images of the specified 
object completely coincide with each other when the center of the template 
image of Fig. 7(a) coincide with the center of the input image of Fig. 7(b), 
and therefore the similarity value must be the maximum at this time. 
Herein, if the edge normal direction vector that is directed outward from 
the image of the object is positive, the edge normal direction vector must 
be directed in the same direction (outward/inward) in the brighter 
background part of Fig. 7(b) and in the darker background part thereof 
when deserving from the image of the specified object. 

[0 0 7 4] 

However, if the brightness of the background of Fig. 7(b) is uneven 
in the right and left background parts with the specified object 
therebetween at this time, the directions are opposed as shown by arrows 
in Fig. 7(b) (i.e., outward from the specified object in the brighter 
background part, and inward from the specified object in the darker 
background part). 

[0 0 7 5] 
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Originally, it is to reach the maximum similarity value, but the 
similarity value does not necessarily become high in this case, and thereby 
a distorted perception is easily shaped. 

[0 0 7 6] 

A detailed description of the above problem will be given with 
reference to Fig. 6. In the case an angle between the evaluation vector T 
of the template image and the evaluation vector I of the input image is 
represented as 9, and its inner product, i.e., cos0 is used as a similarity 
value, the direction of the evaluation vector I of the input image has two 
possibilities, i.e., I of Fig. 6 and I' precisely opposite thereto because of 
unevenness in brightness of the background that exists in the periphery of 
the image of the specified object, as described above. 
[0 0 7 7] 

Therefore, the inner product used as a similarity scale will have two 
products, i.e., cos0 and cos0\ 
[0 0 7 8] 

Herein, 0+0' = n, and cos0' = cos(7u-0) =- cos0. 



[0 0 7 9] 

Therefore, if cos9 is used as the similarity scale, a similarity value 
will be oppositely reduced when there is a need to act properly to increase 
the similarity value, and the similarity value will be oppositely increased 
when there is a need to act properly to reduce the similarity value. 

[0 0 8 0] 

In other words, according to the prior art similarity value, matching 
between the template image and the input image cannot be correctly 
evaluated. As a result, disadvantageous^, a wrong recognition is easily 
formed by the conventional technique, and a recognition result will not be 
clear even if images are shaped. 
[0 0 8 1 ] 

Therefore, in the present invention, a cosine that is a double size of 
0, i.e., cos(20) is used for the formula of the similarity value. Thereby, 
cos(29') = cos(29) from the double size formula of Formula 6 even if 
cos0'=-cos0. In other words, the similarity value rises irrespective of the 
background when there is a need to act to increase the similarity value. 
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Therefore, the matching therebetween can be correctly evaluated in spite 
of the unevenness in brightness of the background. This applies to the 
quadruple size or the sextuple size as well as the double size. 
[0 0 8 2] 

Moreover, the properties of the similarity value monotonously 
decreasing with respect to angle 9 do not change even if an 
even-numbered times angle cosine is used. 

[0 0 8 3] 

Therefore, according to the present invention, patterns can be stably 
extracted by an even-numbered times angle evaluation regardless of the 
brightness condition of the background. 
[0 0 8 4] 

More specifically, the similarity value is defined by the following 
formula in the present invention. 

[Formula 8] 
Similarity value 

:Evaluation vector of input image 



: Evaluation vector of template image 
[0 0 8 5] 

Since Formula 8 consists only of addition and multiplication, the 
similarity value is linear for each evaluation vector of the input and 
template images. Therefore, when Formula 8 is subjected to Fourier 
transformation, Formula 9 is obtained by a discrete correlation theorem of 
Fourier transformation (reference: Fast Fourier Transformation translated 
by Hiroshi Miyakawa published by Science and Technology). 
[Formula 9] 

Fourier transformation values of Kx, Ky 

Fourier transformation complex conjugate of Vx, Vy 

Hereinafter, Fourier transformation value represented as ~, and complex 
conjugate represented as *. 
[0 0 8 6] 

The similarity value of Formula 8 is obtained by subjecting 
Formula 9 to inverse Fourier transformation. From careful consideration 
of Formula 9, the following two respects will become apparent. 
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[0 0 8 7] 

First, (1) the Fourier transformation value of the template image 
and the Fourier transformation value of the input image can be simply 
multiplied together in a transformation value that has been subjected to 
orthogonal transformation. 

[0 0 8 8] 

Secondly, (2) the Fourier transformation value of the template 
image and the Fourier transformation value of the input image do not need 
to be simultaneously calculated, and the Fourier transformation value of 
the template image may be calculated prior to that of the input image. 
[0 0 8 9] 

Therefore, in this embodiment, the template image processing part 
100 is provided with a recording unit 5 to record the output of a 
compression unit 4 before inputting the input image. Thereby, after the 
input image is input to the input image processing part 200, the template 
image processing part 100 has no need to process the template image, and 
the processing performance can be concentrated on processing subsequent 
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to the input image processing part 200 and a multiplication unit 10, thus 
making it possible to improve the processing speed. 
[0 0 9 0] 

Next, a description will be given of a structure subsequent to the 
evaluation vector generation unit 2. As shown in Fig. 1, in the template 
image processing part 100, the evaluation vector of the template image 
that is output from the evaluation vector generation unit 2 is subjected to 
Fourier transformation by the orthogonal transformation unit 3, and is 
output to the compression unit 4. 
[0 0 9 1 ] 

The compression unit 4 compresses the. evaluation vector that has 
been subjected to Fourier transformation, and stores it in the recording unit 
5. As shown in Fig. 3, the evaluation vector subjected thereto includes 
various frequency components in both x and y directions. By 
experiments carried out by the present inventors, it is known that sufficient 
accuracy can be obtained by processing the low frequency components 
(e.g., halves on the low frequency side in the x and y directions, 
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respectively) without processing all of the frequency components. In Fig. 
3, the range (-a ^ x ^ a,-b ^ y ^ b) having no oblique line is an original one, 
and the range (-a/2 ^ x ^ a/2,-b/2^y^b/2) having oblique lines is the 

one that has been compressed. That is, the processing amount becomes 
1/4 times as much as before. 
[0 0 9 2] 

Thereby, an object to be processed can be compressed, and the 
processing speed can be improved. 
[0 0 9 3] 

The compression unit 4 and the recording unit 5 can be omitted 
when a data amount is small, or high-speed processing is not required. 
[0 0 9 4] 

Next, the input image processing part 200 will be described. The 
input image processing part 200 performs almost the same processing as 
the template image processing part 100. That is, according to Formula 2 
and Formula 3, the edge extraction unit 6 outputs an edge normal direction 
vector of the input image defined by the following formula. 
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[Formula 10] 

Edge normal direction vector of input image) 
Ix: Derivative value of input image in x direction 
Iy: Derivative value of input image in y direction 
[0 0 9 5] 

The evaluation vector generation unit 7 inputs the edge normal 
direction vector of the input image from the edge extraction unit 6, and 
outputs an evaluation vector of the input image defined by the following 
two formulas. 

[Formula 11] 
Length normalized vector of input image ) 

[Formula 12] 

When assumed to be a threshold (for fine edge removal), the evaluation 

vector K of the input image is: 

if 

else 

[0 0 9 6] 
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The input image processing part 200 differs from the template 
image processing part 100 only in the fact that normalizing processing of 
the division by "n" is not performed. In other words, evaluation by the 
even-numbered times angle, the normalizing processing to 1 in length, and 
the noise removal processing are performed in the same way as in the 
template image processing part 100. 
[0 0 9 7] 

Next, a structure subsequent to the evaluation vector generation 
unit 7 will be described. As shown in Fig. 1, in the input image 
processing part 200, the evaluation vector of the input image that is output 
from the evaluation vector generation unit 7 is subjected to Fourier 
transformation by the orthogonal transformation unit 8, and is output to 
the compression unit 9. 
[0 0 9 8] 

The compression unit 9 compresses the evaluation vector that has 
been subjected to Fourier transformation, and outputs it to the 
multiplication unit 10. Herein, the compression unit 9 compresses the 
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object to be processed to the same frequency band as the compression unit 
4 (e.g., halves on the low frequency side in the x and y directions, 
respectively, in this embodiment). 
[0 0 9 9] 

The compression unit 9 can be, of course, omitted when the data 
amount is small, or when high-speed processing is not required, and the 
compression unit 9 is likewise omitted when the compression unit 4 is 
omitted in the template image processing part 100. 
[0100] 

Next, the multiplication unit 10 and other construction subsequent 
to this will be described. When the processing in the template image 
processing part 100 and in the input image processing part 200 is 
completed, the multiplication unit 10 inputs a Fourier transformation value 
of each evaluation vector of the template and inputs images from the 
recording unit 5 and the compression unit 9. 
[0101] 

Thereafter, the recording unit 5 performs a product-sum-calculation 
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according to Formula 9, and outputs the result (i.e., Fourier transformation 
value of the similarity value L) to an inverse orthogonal transformation 
unit 11. 

[0102] 

The inverse orthogonal transformation unit 11. subjects the Fourier 
transformation value of the similarity value L to inverse Fourier 
transformation, and outputs the map L(x,y) of the similarity value L to a 
map processing unit 12. The map processing unit 12 extracts a 
high-value point (peak) from the map L(x,y), and outputs its position and 
its value. The map processing unit 12 and the construction subsequent to 
this can be freely arranged if necessary. 
[0103] 

Next, a processing example by the template image of Fig. 2 will be 
described with reference to Fig. 4 and Fig. 5. If the input image is as 
shown in Fig. 4(a), the edge extraction unit 6 extracts an edge component 
in the x direction as shown in Fig. 4(b), and extracts an edge component in 
the y direction as shown in Fig. 4(c). 
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[0104] 

As a result of the aforementioned processing, a similarity value 
map L(x,y ) shown in Fig. 5(a) is obtained. Herein, the front end of the 
arrow indicated as the "maximum value" is a peak of this map, and, as is 
apparent from a comparison with the input image of Fig. 5(b), it is 
understood that the correct point can be clearly recognized by only one 
point. 

[0105] 

In the conventional technique, the calculation must be performed 
by the number of times of the product, i.e., 2AB in order to successively 
scan the template image on input images and to obtain pyoof Formula 1 at 
each position, wherein the size of the input image is A(=2y) and the size of 
the template image is B. Herein, the number of calculations is evaluated 
by the number of times of the product with high calculation costs. 
[0106] 

In contrast, in this embodiment, FFT is performed twice by the 
orthogonal transformation unit 3 and 8, the product sum calculation is then 
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performed by the multiplication unit 10, and inverse FFT is performed 
once by the inverse orthogonal transformation unit 11. The number of 
calculations is merely the number of times of the product of 
3{(2y-4)A+4}+2A. 
[0107] 

In a comparison of the number of calculations therebetween, the 
number of calculations of the product according to this embodiment 
becomes about 1/100 of the number of calculations of the product 
according to the conventional technique, if A=256x256=2 16 , and B=60x60. 
As a result, unusually high-speed processing can be carried out. 
[0108] 

The discrete correlation theorem of Fourier transformation cannot 
be used, in such a nonlinear formula as Formula 1 of the conventional 
technique. 

[0109] 

Therefore, in the conventional technique, processing for the 
template image cannot be performed prior to that for the input image as 
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shown in Fig. 1 of this embodiment. In other words, in the conventional 
technique, both the template image and the input image must be 
simultaneously processed. Also in this respect, the processing speed in 
this embodiment becomes higher than that in the conventional technique. 
[0 110] 

(Embodiment 2) 

In this embodiment, a conjugate compression unit 13 and a 
conjugate restoring unit 14 are added to the elements of Fig. 1 as shown in 
Fig. 9. The conjugate compression unit 13 further halves the Fourier 
transformation value of the evaluation vector of the template image that 
has been read from the recording unit 5 by use of complex conjugate 
properties of Fourier transformation. 
[0111] 

This respect will be described. The following formula is 
established for the spectrum obtained by real number Fourier 
transformation. 
[Formula 13] 
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[0 112] 

In detail, a spectral value at a specific point is equal to the complex 
conjugate of a spectral value at a position symmetrical to that point in a uv 
coordinate system. Use of this property makes it possible to reduce the 
processing amount by half of the uv coordinate system as shown in Fig. 
10(a). 

[0 113] 

Since the compression unit 4 and the compression unit 9 further 
compress it, the data amount can be reduced to 1/8 of the original data as 
shown in Fig. 10(b). Therefore, the processing speed can be improved, 
and memory capacity can be saved. 
[0114] 

As a result, in this embodiment, the Fourier transformation value 
that the conjugate compression unit 13 has read from the recording unit 5 
is halved, and is output to the multiplication unit 10. Thereafter, the 
conjugate reconstruction unit 14 performs processing by which the output 
from the multiplication unit 10 is doubled, and outputs it to the inverse 
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orthogonal transformation unit 1 1 . 
[0115] 

These constitute the basis of the present invention. A useful 
technique will be hereinafter disclosed by application of this basis. 

[0 116] 
(Embodiment 3) 

In this embodiment, a first example and a second example will be 
described. A technique for performing face extraction by using an 
enlarged/reduced template image is disclosed in either example. 
According to this technique, processing by use of similar templates 
different in size can be carried out efficiently and at high speed. The 
application of this technique is not limited only to face extraction. 
[0 117] 

TFirst Example J 

Fig. 11(a) is a block diagram of a template image processing part 
according to Embodiment 3 (first example) of the present invention. 
Some modifications have been made to the template image processing part 
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100 of Fig. 1 or Fig. 9 as shown in the figure. 
[0 118] 

First, when a template image is input to a template image 
processing part 101, the image is enlarged/reduced by a 
enlargement/reduction unit 15. Thereafter, the enlarged/reduced template 
image is output to an edge extraction unit 1 . 

[0 119] 

When an evaluation vector generation unit 2 outputs the evaluation 
vector of the template image, an addition unit 16 applies addition 
processing to this, and data obtained by the addition processing is output 
to an orthogonal transformation unit 3. 
[0 12 0] 

The addition unit 16 performs addition processing for each size 
range according to the following formula. 

[Formula 14] 
Evaluation vector of template image 
wherein the size range is a~b. 
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[0121] 

Although image processing is carried out by use of one template 
image in Embodiments 1 and 2, templates with multiple sizes are 
processed by use of a plurality of templates in this embodiment. 

[01 2 2] 

Moreover, the sizes of the templates are each separated into a 
specific range, and a template processing result is superimposed thereon. 
[0 1 2 3 ] 

Fig. 12 shows an example of an input image that includes a face 
image of a person. In this example, a template image of a straight face, 
such as that of Fig. 13(a), is prepared. A template image of a face 
inclined at a specific angle as shown in Fig. 13(d) is further prepared. 
The degree to which the face to be prepared is inclined can be 
appropriately selected. 
[0 12 4] 

When the edges of the template images of Figs. 13(a) and 13(d) are 
extracted, appearances shown in Fig. 13(b) and 13(e), respectively, are 
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obtained in this embodiment. Further, when the image of Fig. 13(a) or 
13(d) is input, an evaluation vector corresponding to the edge image is 
generated as shown in Fig. 13(c) or 13(f). 
[0125] 

As in Embodiments 1 and 2, data concerning the template images 
are stored in the recording unit 5. 
[01 2 6] 

When the input image is input, a procedure subsequent to those of 
the input image processing part 200 and the multiplication unit 10 is 
performed in the same way as in Embodiments 1 and 2. 

[0 12 7] 

Corresponding similarity value maps are calculated for 
superimposed template data of all size ranges of the template images. 
[0128] 

Thereby, a detection result, such as that shown in Fig. 14, is 
obtained. As is apparent from Fig. 14, not only the overall position of the 
face but also its size and the positions of the main face parts, such as the 
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eyes, nose, and mouth, can be recognized. 
[0129] 

At this time, it is desirable to display the processing result on a 
display or to output it to a printer as shown in Fig. 14 in the stage 
subsequent to the map processing unit 12. 

[0 13 0] 

According to First Example, the following effects can be obtained. 

(1) Even if template data in a specific size range is added, and 
image processing is performed with layered templates, a part similar to the 
template often shows a high similarity value. Therefore, when similarity 
values of all sizes are calculated, the processing of the product sum 
calculation part and the processing of the inverse orthogonal 
transformation part must be repeatedly performed NM times, wherein N is 
the number of the templates, and M is the number of all sizes. On the 
other hand, the number of processings reaches NM/H by use of the 
superimposed template, wherein H is the width of a superimposed range. 
Therefore^ improvements in efficiency can be accomplished, and image 



processing can be carried out at a higher speed for face extraction, for 
example. 

[0131] 

(2) Additionally, a rough candidate range of eyes, nose, and mouth 
can be extracted not only merely by outputting the positional data for a 
face but also by superimposing the template on the face candidate position 
as shown in Fig. 14. 
[0 13 2] 

The positions of face parts, such the eyes, nose, and mouth, can 
also be more accurately extracted by processing the images in these ranges 
much more finely, as described in the following embodiments. 
[0133] 

r Second Example J 

In this example, a template image processing part 102 is 
constructed as shown in Fig. 11(b). That is, the position of the addition 
unit 16 is moved between the compression unit 4 and the recording unit 5, 
in comparison with the First Example. Thereby, the addition unit 16 is 
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formed to perform addition processing according to the following formula. 
[Formula 15] 

wherein the size range is a~b. 

[0 13 4] 

The similarity value formula that is linear before being subjected to 
Fourier transformation is still linear after being subjected thereto. 
Therefore, the position of the addition unit 16 can be changed from the 
First Example to the Second Example, for example. 

[0135] 

Thereby, an object to be processed by the addition unit 16 can be 
reduced more than in the First Example because the object is compressed 
by compression unit 4. Therefore, suitably, processing speed can be 
improved. 

[0136] 
(Embodiment 4) 

Embodiment 4 will be described with reference to Fig. 15 and Fig. 
16. This Embodiment discloses an efficient technique for turning the 
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value of the maximum point of the similarity value map described in 
embodiments 1 and 2 into a stronger peak value. 
[0 13 7] 

Generally, in the similarity value map, a peak appears in a part 
overlapping with the template image. In this embodiment, a peak pattern 
"p" around and including a maximum point is used as a filter for the 
similarity value map, and the value of a part similar to the peak pattern in 
the similarity value map is amplified. 
[0 13 8] 

As shown in Fig. 1 5, in this embodiment, a peak pattern processing 
part 300 is added to the structure of Fig. 1 showing embodiment 1. 
[0139] 

Fig. 16 shows a mask for this peak pattern. As indicated in Fig. 
16, in this peak pattern, normalization is made to set an average value at 0. 
[0140] 

In the peak pattern processing part 300, an orthogonal 
transformation unit 17 subjects this peak pattern to Fourier transformation, 



and a compression unit 18 compresses a Fourier transformation value so as 
to record compressed data in a recording unit 19. 
[0141] 

Since the mask is used, the similarity value formula is established 
not by Formula 8 but by the following formula that reflects the mask. 
[Formula 16] 

Peak pattern low frequency spectrum 
L(u,v): Similarity value before filtering 
M(u,v): Similarity value after filtering 
[0 14 2] 

Therefore, the multiplication unit 10 reads data from the recording 
unit 5, the recording unit 19, and the compression unit 9, then performs a 
product-sum-calculation, and outputs the Fourier transformation value of 
the similarity value corrected by the peak pattern. 
[0 14 3] 

A peak pattern filter can filter the similarity value map L according 
to the following formula, but, if so, a large amount of product sum 
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calculations will be inefficiently needed. 
[Formula 17] 

[0 14 4] 

In contrast, in this embodiment, simple, accurate processing is 
performed according to Formula 16 without performing a large amount of 
calculations like Formula 17. 
[0145] 

Therefore, according to this embodiment, the peak point of the 
similarity value map can be efficiently amplified. Additionally, a part 
similar to the template can be clearly and stably detected from the input 
image while reflecting the peak pattern. 

[0146] 
(Embodiment 5) 

In this embodiment, a pixel mean in the range of the template 
image as well as the similarity value between the edge of the input image 
and the edge of the template image is added to a similarity judgment. 
[0 14 7] 



This structure is shown in Fig. 17. Like Fig. 15, a mask pattern 
processing part 400 is added to the structure of Fig. 1 of Embodiment 1. 
[0148] 

However, the structure of Fig. 17 differs from that of Fig. 15 in the 
fact that the mask pattern processing part 400 is used not to input a peak 
pattern but to input a template image, and that a mask pattern generation 
unit 20 is provided for generating a mask pattern that depends on this 
image. 

[0149] 

Like Fig. 15, the output of the mask pattern generation unit 20 is 
subjected to Fourier transformation by the orthogonal transformation unit 
21, is then compressed by the compression unit 22, and is recorded in the 
recording unit 23. 
[0 15 0] 

Further, since a mask is used, the similarity value formula is 
expressed not as Formula 8 but as the following formula that reflects the 
mask. 



66 



[Formula 18] 

Q(x,y): Pixel-average-added similarity value 
q(l,m): Mask pattern 
L(x,y): Similarity value before filtering 
I(x,y): Input image data 
[0151] 

For the same reason mentioned in Embodiment 4, a large amount 
of product sum calculations will be inefficiently needed if multiplication is 
performed in this way. 

[0 15 2] 

Formula 19 is obtained by subjecting this to Fourier transformation, 
and the calculation can be performed very simply. 
[Formula 19] 

[0153] 

Therefore, the multiplication unit 10 performs a 
product-sum-calculation according to Formula 19. 
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[0154] 

Next, the relationship between the template image and the mask 
pattern will be described with reference to Fig. 18. Herein, in order to 
add a pixel mean in the range of the template image to a similarity 
judgment, the mask pattern generation unit 20 generates a mask pattern as 
shown in Fig. 18(b) for a template image as shown in Fig. 18(a) 
[0 15 5] 

In greater detail, in the template image of Fig. 18(a), a value of 1/N 
is set at each point of the inner part (inside the circle) whose pixel mean is 
to be calculated, and a value of 0 is set at other points. Herein, N is the 
number of points in the inner part, and a result of an addition of the value 
of all points of the mask pattern is 1. 
[0 15 6] 

According to this embodiment, a mean of pixel values inside the 
image can also be added to the similarity value, and a specified object can 
be extracted from the input image more accurately and more efficiently 

[0157] 
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A mean of a square value of each pixel can be calculated such that 
data in which each pixel of the input image is squared by the input image 
processing part is formed, and the same processing is applied thereto. 

[0158] 

Therefore, a distributed value as well as a mean within a range can 
be efficiently calculated. 

[0 15 9] 
(Embodiment 6) 

This embodiment discloses a technique by which a bilaterally 
symmetric image with respect to a template image can be efficiently 
processed. 

[0 16 0] 

This structure is shown in Fig. 19. That is, in addition to the 
structure of Fig. 1 of Embodiment 1, a symmetric vector generation unit 
24 is provided between the recording unit 5 and the multiplication unit 10. 
As in Embodiment 1, Formula 8 is used as the similarity value formula in 
this embodiment. 
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[0161] 

Next, a description will be given of how to treat a bilaterally 
symmetric template image. For example, if the template image of Fig. 
20(a) is an original image, a template image in which the original one has 
been bilaterally reversed is shown as in Fig. 20(b). 
[0 16 2] 

The relationship between the edge normal direction vectors of these 
template images is expressed by the following formula. 
[Formula 20] 

: Edge normal direction vector of image resulting from subjecting original 
template image to bilateral reversal 
[0 16 3] 

The evaluation vector of the template image that has been 
bilaterally reversed is expressed by the following formula. 
[Formula 21] 

Evaluation vector of 
Evaluation vector of 



[0164] 

Concerning Fourier transformation, because of the relation of 
Formula 22, Formula 23 is obtained by applying Formula 22 to Formula 
21. 

[Formula 22 ] 
[Formula 23 ] 
[0165] 

In detail, the evaluation vector of the template image that has been 
bilaterally reversed can be easily generated by, for example, reversing the 
positive and negative of the evaluation vector of the original template 
image. 

[0166] 

Therefore, the evaluation vector of the template image that has 
been bilaterally reversed can be obtained only by allowing the symmetric 
vector generation unit 24 to apply Formula 23 to the evaluation vector of 
the original template image of the recording unit 5 in Fig. 19. 
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[0 16 7] 

When simply assumed, there is no need to perform complex 
processing, such as a procedure in which the image of Fig. 20(b) is 
generated from the image of Fig. 20(a), and the evaluation vector is again 
calculated from the image of Fig. 20(b). 
[0 16 8] 

Thereby, the evaluation vector of the template image that has been 
bilaterally reversed can be generated without direct calculations, and 
processing speed can be improved. Additionally, recording capacity can 
be saved because the need to purposely store the template image that has 
been bilaterally reversed is obviated. 

[0169] 
(Embodiment 7) 

In this embodiment, eye/eyebrow extraction processing is added to 
the face extraction processing described in Embodiment 3. 
[0170] 

As in Fig. 22(b), eyes/eyebrow candidate range can be roughly 
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extracted from the input image of Fig. 22(a) according to the processing 
described in Embodiment 3. 
[0 1 7 1 ] 

Point biserial correlation coefficient filters shown in Fig. 23(a) 
through Fig.23(d) are applied onto each point of the image of this 
eye/eyebrow candidate range, a map of point biserial correlation values is 
then formed, and points where the correlation value in the map become 
high are set as the eyes center point 3002 and as an eyebrow center point 
3003, respectively. 
[0 17 2] 

The point biserial correlation coefficient t\ is defined by the 
following formula (reference; Multivariate Analysis Handbook, page 17, 
Modern Mathematics). 
[Formula 24] 

Overall range: addition of 1st range to 2nd range 

Number of pixels of 1 st range 

n: Number of pixels of overall range 



73 



Average brightness level of 1st range 
Average brightness level of 2nd range 
S: Standard deviation of overall range 
Value in V~ is constant when mask size is fixed. 
[0173] 

Referring to Fig. 23(a) through Fig.23(d), Fig. 23(a) shows the 
positional relationship of all ranges, Fig. 23(b) shows an overall range 
mask, Fig. 23(c) shows a 1st range mask, and Fig. 23(d) shows a 2nd 
range mask. 

[0174] 

If the filter shape according to this point biserial correlation 
coefficient is formed as shown in Fig. 23(a), it will be expected that the 
eyebrow center 3002 and the eye center 3003 can be extracted as shown in 
Fig. 22(c). 

[0175] 

Next, filter processing according to the point biserial correlation 
coefficient will be described. 



[0176] 

First, main components of Formula 24 are expressed by the 
following formula. 

[Formula 25] 
wherein, 

Ml: 1st range mask 
M2: 2nd range mask 
Ma: Overall range mask 
I(x,y): Each pixel value of input image 
[0 17 7] 

The following formulas are obtained by subjecting each component 
to Fourier transformation. 
[Formula 26] 

[Formula 27] 
[Formula 28] 
wherein 
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[0178] 

In order to perform these processings, the structure of Fig. 21 is 
formed, for example. First, each mask of the overall range, the 1st range, 
and the 2nd range is subjected to Fourier transformation by the orthogonal 
transformation unit 51 through 53. 
[0 1 7 9] 

An input image and a result of the face extraction described in 
Embodiment 3 are input to the eye/eyebrow candidate range extraction 
unit 54 through the map processing unit 12. Based on these inputs, the 
eye/eyebrow candidate range extraction unit 54 extracts only the 
eye/eyebrow candidate range shown in Fig. 22(b) from Fig. 22(a). Data 
concerning this eye/eyebrow candidate range is subjected to Fourier 
transformation by an orthogonal transformation unit 55 without any 
changes, is then squared by a square unit 56, and is subjected to Fourier 
transformation by an orthogonal transformation unit 57. 
[0 18 0] 

Thereafter, data shown in Formula 27 is input through a 
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multiplication unit 58 to an inverse orthogonal transformation unit 62 
precedent to a nmap forming unit 65. Likewise, data shown in Formula 
26 is input to an inverse orthogonal transformation unit 63 through a 
multiplication unit 60, and data shown in Formula 28 is input to an inverse 
orthogonal transformation unit 64 through a multiplication unit 61. The 
input data is subjected to inverse Fourier transformation by the inverse 
orthogonal transformation unit 62 through 64, and is output to the r\ map 
forming unit 65. 
[0181] 

Thereafter, the tj map forming unit 65 performs a calculation 
according to Formula 24 when receiving the data from the inverse 
orthogonal transformation unit 62 through 64, and outputs a map T|(x,y) of 
the point biserial correlation coefficient. 
[0 18 2] 

The eye/eyebrow center extraction unit 66 extracts two points 
having high values from the map r\ (x,y) output by the r\ map forming unit 
65, and outputs them as the eyes center and an eyebrow center, 
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respectively. 

[0183] 

In this structure, multiplication must be performed roughly 
15xl5*N=225N times, wherein "15x15 "(pixels) is the filter size of Fig. 
23(a), and N is the number of pixels of the input image. 

[0 18 4] 

In contrast, according to this embodiment, the number of product 
calculations is roughly N+[(2 y -4)N+4] X 5+12N+N+N=5N(2 y -l)+20 
times. For convenience, the calculation of is assumed as equal to 
one multiplication processing. 
[0185] 

That is, (calculation amount of this embodiment) < (calculation 
amount of the conventional technique) under the condition that y is 22 
or less. 

[0 18 6] 

Usually, a huge image in which N is 2 22 or more is not used. 
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Therefore, it is understood that the image processing of this embodiment is 
smaller in the number of calculation times than that of the conventional 
processing, and is performed faster than that of the conventional 
processing. 

[0 18 7] 

Thus, according to this embodiment, the point biserial correlation 
coefficient filter processing can be more efficiently performed. 
[0188] 

The positions of other face parts, such as eye corners, mouth edges, 
nostrils, and irises, can be calculated by variously changing the masks 
shown in Fig. 23. 

[0 18 9] 

(Embodiment 8) 

This embodiment discloses a technique for expanding the function 
of the face extraction according to Embodiment 3 and extracting a mouth 
range, which is a facial organ, from a face image. 
[01 9 0] 
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As described in Embodiment 3 with reference to Fig. 14, a mouth 
candidate range shown in Fig. 25(b) can be extracted from the input image 
of Fig. 25(a). 

[0191] 

When the extracted mouth candidate range is projected onto the 
Y-axis (i.e., when the total h of pixel values is plotted along the X-axis), a 
graph roughly shown in Fig. 25(c) is obtained. 
[0192] 

The total h is defined by the following formula. 
[Formula 29] 

I(x,y): Each pixel value of input image 
w: Width of mouth candidate range 
h(x,y): Projection value 
[0 19 3] 

In order to efficiently obtain this total by use of orthogonal 
transformation, a mask shown in Fig. 25(d) is prepared. Formula 29 can 
be rewritten like the following formula, including this mask. 
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[Formula 30] 
Mask 

[0 19 4] 

When this is subjected to Fourier transformation, the following 
formula is obtained. 
[Formula 31] 

[0195] 

That is, what is required to obtain this projection value is to subject 
the input image and the mask pattern to Fourier transformation, thereafter 
perform the calculation of Formula 31, and subject its result to inverse 
Fourier transformation. Thereby, the map "h(x,y)" of the projection 
value "h" can be obtained. 
[0196] 

From the above result, it is desirable to form the structure of Fig. 24, 
for example. As shown in Fig. 24, the input image and the mask pattern 
are subjected to Fourier transformation by the orthogonal transformation 
unit 25 and 26, respectively. 
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[0 19 7] 

Thereafter, the transformation value is multiplied by the 
multiplication unit 27, and is subjected to inverse Fourier transformation 
by the inverse orthogonal transformation unit 28. As a result, the map 
h(x,y) of the projection value is obtained. 
[0 19 8] 

The obtained map h(x,y) is developed by a projection data 
extraction unit 29, and a maximum point extraction unit 30 calculates two 
maximum points from this map, and outputs the range between these 
maximum points as mouth range. 
[0199] 

The mask pattern can be, of course, beforehand subjected to 
Fourier transformation as in Embodiment 1, and, if so, a processing 
performance can be concentrated on the processing of the input image, and 
processing speed will be improved. 
[0 2 0 0] 

According to this embodiment, the position of the mouth, which is 
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a facial organ, (more specifically, the positions of both the upper and lower 
lips) can be accurately calculated. 
[0 2 0 1 ] 

(Embodiment 9) 

This embodiment discloses a technique for expanding the function 
of the face extraction according to Embodiment 3, and applying 
predetermined processing to a face image according to first and second 
examples of this embodiment. 
[0 2 0 2] 

TFirst ExampleJ 

In this example, a digital watermark is embedded into an extracted 
face image. That is, the structure of Fig. 26 is formed. A result using 
part 600 is provided at the subsequent stage of the map processing unit 12 
of Fig. 11. 

[0 2 0 3] 

The result-using part 600 includes a face image cutting-out unit 3 1 
for separating an input image into a face image and a non-face image with 
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reference to the input image and the face position that is determined by the 
map processing unit 12. The face image part is output from the face 
image cutting-out unit 31 to a digital watermark embedding unit 32, and a 
predetermined digital watermark is embedded into the face image part. 
Herein, this digital watermark can be a well-known one. 
[0 2 0 4] 

On the other hand, the part excluding the face image is output from 
the face image cutting-out unit 31 directly to an image synthesizing unit 
33. 

[0 2 0 5] 

The image synthesizing unit 33 combines the face image into 
which the digital watermark is embedded with the part excluding the face 
image, and outputs an image into which the digital watermark has been 
embedded. 

[0 2 0 6] 

Therefore, with respect to, for example, an input image of Fig. 
27(a), an output image of Fig. 27(b) into a face part of which a digital 
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watermark has been embedded is obtained. 
[0 2 0 7] 

The name of a model to be photographed or the date of the 
photograph is suitable as a digital watermark. 
[0 2 0 8] 

According to this example, watermark data can be concentratively 
and easily embedded into a face part that is liable to be falsified. 
[0 2 0 9] 

r Second Example J 

In this example, specific editing is applied only to an extracted face 
image. That is, the structure of Fig. 28 is formed. A result-using part 
601 is provided at the subsequent stage of the map processing unit 12 of 
Fig. 11. 

[0 2 10] 

The result-using part 601 includes the face image cutting-out unit 
31 for separating an input image into a face image and a non-face image 
with reference to the input image and the face position that is determined 
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by the map processing unit 12. The face image part is output from the 
face image cutting-out unit 31 to an image correction unit 34, and 
predetermined editing is applied to the face image part. Herein, the 
editing can be a well-known one. 
[0 2 1 1 ] 

On the other hand, the part excluding the face image is output from 
the face image cutting-out unit 31 directly to the image synthesizing unit 

33. i 

[0212] 

The image synthesizing unit 33 combines the face image that has 
been edited with the part excluding the face image, and outputs an edited 
image. 

[0213] 

Therefore, referring to, for example, an input image of Fig. 29(a), 
an output image in which a face range has been extracted like Fig. 29(b), 
and only a face part has been edited like Fig. 29(c) is obtained. 
[0 2 14] 
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In an image of a person photographed by counter light, an image of 
the face part might be corrected so that a face color becomes whitish, 
because the face is too dark and is hard to see. However, in the editing 
by the image correction unit 34, such an image may be arbitrarily 
corrected. 

[0215] 

According to this example, an image of only the face part can be 
easily and selectively corrected, or, in other words, image correction can 
be performed without exerting an influence on a part excluding the face. 
[0216] 

TThird Example J 

In this example, an extracted face image is corrected to be easily 
perceptible. That is, the structure of Fig. 30 is formed. First, a 
result-using part 602 is provided behind the map processing unit 12 of Fig. 
11. 

[0 2 17] 

The result-using part 602 includes the face image cutting-out unit 
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31 for separating an input image into a face image and a non-face image. 
A face image that has been cut out is output to a face internal range 
extraction unit 35 and to the image correction unit 34. 
[0 2 18] 

According to the procedure described in Embodiment 3, the face 
internal range extraction unit 35 calculates the positions of main face parts, 
such as eyes, nose, and mouth, from the cut-out face image by use of a 
template, and extracts a facial inner image (i.e., image that is situated 
inside the whole face). In this example, the facial inner image is an 
image in a rectangular range having a predetermined size and centering the 
face center (e.g., the center of the nose). However, the size of the 
rectangular range can be changed, and the center can be located at a 
position slightly deviated from the face center. 
[0 2 19] 

The extracted facial inner image is output to an image feature 
extraction unit 36. The image feature extraction unit 36 calculates a 
useful feature for correction of a face image. In this example, the image 
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feature extraction unit 36 calculates brightness distribution of a facial 
inner image, and outputs its result to a correction function determining 
unit 37 in the form of a brightness histogram. The image feature 
extraction unit 36 can, of course, output a brightness minimum value and a 
brightness maximum value in the brightness histogram to the correction 
function determining unit 37. 
[0 2 2 0] 

The correction function determining unit 37 determines a 
correction function by which the contrast of the facial inner image 
becomes clearer with reference to a feature input from the image feature 
extraction unit 36, and outputs this correction function to the image 
correction unit 34. 
[0 2 2 1 ] 

In this example, the image feature extraction unit 36 outputs the 
brightness histogram as mentioned above, and therefore the correction 
function determining unit 37 calculates a correction function so that the 
minimum brightness that appears in the histogram reaches the minimum 
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brightness in the dynamic range, and so that the maximum brightness 
reaches the maximum brightness therein. Thereafter, the correction 
function determining unit 37 outputs the correction function to the image 
correction unit 34. 
[0 2 2 2] 

The horizontal axis of the brightness histogram of Fig. 30 indicates 
brightness values, and it will be understood that only a part of the dynamic 
range that can express brightness is used. Therefore, the correction 
function determining unit 37 determines the correction function so as to 
fully use the dynamic range, and thereby the image correction unit 34 can 
correct the face image so that the image can be easily discerned, or, more 
specifically, so that it can have abundant gradations in appearance. As a 
result of this correction, the face image infallibly easily viewable though 
there might be a situation in which a part other than the face image is 
whitened or darkened. 
[0 2 2 3] 

Herein, since the correction function is determined on the basis of 
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only the facial inner image, correction is never misled by a non-face part. 
Therefore, after correction, it can be guaranteed that viewability of the 
face is improved. 
[0 2 2 4] 

Although Fig. 30 shows the drawing so as if to make correction 
only to the cut-out face image, similar correction can be made to the whole 
input image. 

[0 2 2 5] 

Further, as a feature to be sought by the image feature extraction 
unit 36, an index that represents the lightness/darkness of an image, such 
as brightness, can be used as described above. In addition, a chroma 
average or a hue average can be used. 
[0 2 2 6] 

If the chroma average is used, the correction function determining 
unit 37 can output, for example, a chroma amplification coefficient to the 
image correction unit 34, and, if the hue average is used, the correction 
function determining unit 37 can output, for example, a hue rotational 
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angle to the image correction unit 34. 
[0 2 2 7] 

Herein, a pale face image can be more vividly corrected when the 
chroma average is used as a feature, and, when the hue average is used, a 
less reddish face image can be corrected to be more reddish. In either case, 
the face image can be made more natural. Additionally, a combination of 
two or more of brightness, chroma average, or hue average can be used as 
a feature. 

[0 2 2 8] 
[EFFECT OF THE INVENTION] 

According to the present invention, patterns can be extracted stably 
and at high speed with less influence of the difference of the photographic 
condition and background. 
[0 2 2 9] 

Patterns of a plurality of templates can also be extracted at high 

speed. 

[0 2 3 0] 
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The similarity value map can be amplified and a part similar to the 
template can be stably detected from the input image. 
[0 2 3 1 ] 

Attributes inside the image can be added to the similarity value, 
and can be processed more clearly and more efficiently. 
[0 2 3 2] 

Data of the template image that have been bilaterally reversed can 
be generated indirectly, and data amount of the template image can be 
reduced. 

[0 2 3 3] 

An object to be processed can be reduced rationally. 
[0 2 3 4] 

Point biserial correlation coefficient filters can extract the center 
point of eyes and eyebrows. 
[0 2 3 5] 

A mouth range, which is a facial organ can be specified accurately. 
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[0 2 3 6] 

Watermark data can be concentratively embedded into a face part 
that is liable to the falsified. 
[0 2 3 7] 

An image only the face part can be selectively performed the image 
correction. 

[0 2 3 8] 

According to the feature of only the facial inner image, the 
viewability of the face image can be infallibly improved. 
[DESCRIPTION OF DRAWINGS] 

[Fig- 1] 

Fig. 1 is a block diagram of an image processing apparatus 
according to Embodiment 1 of the present invention. 
[Fig. 2] 

Fig. 2 (a) is a view showing a template image. 
Fig. 2 (b) is a view showing an edge extraction image (x 
component) of the template image. 
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Fig. 2 (c) is a view showing an edge extraction image (y 
component) of the template image. 
[Fig. 3] 

Fig. 3 is an explanatory drawing for explaining the compression 
processing of an evaluation vector. 
[Fig. 4] 

Fig. 4 (a) is a view showing an input image. 

Fig. 4 (b) is a view showing an edge extraction image (x 
component) of the input image. 

Fig. 4 (c) is a view showing an edge extraction image (y 
component) of the input image. 
[Fig. 5] 

Fig. 5 (a) is a view showing a map of similarity values. 
Fig. 5 (b) is a view showing the input image. 
[Fig. 6] 

Fig. 6 is an explanatory drawing for explaining a positive/negative 
reversal of an inner product. 



95 



[Fig. 7] 

Fig. 7 (a) is a view showing a template image. 
Fig. 7 (b) is a view showing an input image. 
[Fig. 8] 

Figs. 8 (a) is a view showing a template image. 
Figs. 8 (b) is a view showing a template image. 
[Fig. 9] 

Fig. 9 is a block diagram of an image processing apparatus 
according to Embodiment 2 of the present invention. 
[Fig. 10] 

Figs. 10 (a) is a graph showing conjugate properties. 
Figs. 10 (b) is a graph showing conjugate properties. 
[Fig. 11] 

Fig. 11 (a) is a block diagram of a template image processing part 
according to Embodiment 3 (first example) of the present invention. 

Fig. 11 (b) is a block diagram of a template image processing part 
according to Embodiment 3 (second example) of the present invention. 
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[Fig. 12] 

Fig. 12 is a view showing an input image. 
[Fig. 13] 

Fig. 13 (a) is a view showing a template image. 

Fig. 13 (b) is a view showing an edge extraction image. 

Fig. 13 (c) is a view showing an enlargement/reduction template 

image. 

Fig. 13 (d) is a view showing a template image. 

Fig. 13 (e) is a view showing an edge extraction image. 

Fig. 13 (f) is a view showing an enlargement/reduction template 

image. 

[Fig. 14] 

Fig. 14 is a view showing a face extraction result. 
[Fig. 15] 

Fig. 15 is a block diagram of an image processing apparatus 
according to Embodiment 4 of the present invention. 
[Fig. 16] 
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Fig. 16 is a view showing a peak pattern. 
[Fig. 17] 

Fig. 17 is a block diagram of an image processing apparatus 
according to Embodiment 5 of the present invention. 
[Fig. 18] 

Fig. 18 (a) is a view showing a template image. 
Fig. 18 (b) is a view showing a mask pattern. 
[Fig. 19] 

Fig. 19 is a block diagram of an image processing apparatus 
according to Embodiment 6 of the present invention. 
[Fig. 20] 

Fig. 20 (a) is a view showing an original template image. 
Fig. 20 (b) is a view showing a template image that has been 
subjected to a bilateral reversal. 
[Fig. 21] 

Fig. 21 is a block diagram of a part of the image processing 
apparatus according to Embodiment 6 of the present invention. 
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[Fig. 22] 

Fig. 22 (a) is a view showing an input image. 

Fig. 22 (b) is a view showing the eyes/eyebrow candidate range. 

Fig. 22 (c) is a view showing a recognition result. 

[Fig. 23] 

Fig. 23 (a) is a view showing a filter shape according to 
Embodiment 7 of the present invention. 

Fig. 23 (b) is an explanatory drawing of an overall range mask. 
Fig. 23 (c) is an explanatory drawing of a range- 1 mask. 
Fig. 23 (d) is an explanatory drawing Of a range-2 mask. 
[Fig. 24] 

Fig. 24 is a block diagram of a part of an image processing 
apparatus according to Embodiment 8 of the present invention. 
[Fig. 25] 

Fig. 25 (a) is a view showing an input image. 

Fig. 25 (b) is an explanatory drawing of a mouth candidate range. 

Fig. 25 (c) is a graph of projection values. 
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Fig. 25 (d) is an explanatory drawing of a mask pattern. 
Fig. 25 (e) is a view showing a projection-value map image. 
Fig. 25 (f) is a graph of projection values. 
[Fig. 26] 

Fig. 26 is a block diagram of a part of an image processing 
apparatus according to Embodiment 9 (first example) of the present 
invention. 

[Fig. 27] 

Fig. 27 (a) is a view showing an input image. 
Fig. 27 (b) is a view showing an output image. 
[Fig. 28] 

Fig. 28 is a block diagram of a part of the image processing 
apparatus according to Embodiment 9 (second example) of the present 
invention. 

[Fig. 29] 

Figs. 29 (a) is a view showing an input image. 
Figs. 29 (b) is a view showing an output image. 
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[Fig. 30] 

Fig. 30 is a block diagram of a part of the image processing 
apparatus according to Embodiment 9 (third example) of the present 
invention. 

pESCRIPTION OF SYMBOLS] 



1 


edge extraction unit 


2 


evaluation vector generation unit 


3 


orthogonal transformation unit 


4 


compression unit 


5 


recording unit 


6 


edge extraction unit 


7 


evaluation vector generation unit 


8 


orthogonal transformation unit 


9 


compression unit 


10 


multiplication unit 


11 


inverse orthogonal transformation unit 


12 


map processing unit 
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13 conjugate compression unit 

14 conjugate reconstruction unit 

1 5 enlargement/reduction unit 

16 addition unit 

1 7 orthogonal transformation unit 

1 8 compression unit 

19 recording unit 

20 mask pattern generation unit 

2 1 orthogonal transformation unit 

22 compression unit 

23 recording unit 

24 symmetric vector generation unit 
25,26 orthogonal transformation unit 

27 multiplication unit 

28 inverse orthogonal transformation unit 

29 projection data extraction unit 

30 maximum point extraction unit 
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<» 



31 


face image cutting-out unit 


32 


digital watermark embedding unit 


33 


image synthesizing unit 


601 


result-using part 


34 


image correction unit 


51 


orthogonal transformation unit 


52 


orthogonal transformation unit 


53 


orthogonal transformation unit 


54 


eye/eyebrow candidate range extraction unit 


55 


orthogonal transformation unit 


56 


square unit 


57 


orthogonal transformation unit 


58 


multiplication unit 


59 


reduction unit 


60 


multiplication unit 


61 


multiplication unit 


62,63,64 inverse orthogonal transformation unit 
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65 r| map forming unit 

66 eye/eyebrow center extraction unit 
1 00 template image processing part 

1 0 1 , 1 02 template image processing part 
200 input image processing part 
300 peak pattern processing part 

400 mask pattern processing part 

500 projection image processing part 

600 result-using part 
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POCUMENT NAME] ABSTRACT 
[ABSTRACT] 

[PROBLEM TO BE SOLVED] It is an object of the present invention 
to provide an image processing method and an image processing apparatus 
capable of obtaining an accurate, clear recognition result and capable of 
performing high speed processing. 

[SOLUTION] An image processing method for detecting an object 
from an input image by use of a template image. The image processing 
method includes a step of inputting a specified image with respect to both 
a template image and an input image, a step of calculating an edge normal 
direction vector of the specified image, a step of generating an evaluation 
vector from the edge normal direction vector, a step of subjecting the 
evaluation vector to orthogonal transformation, a step of performing a 
product sum calculation of corresponding spectral data with respect to 
each evaluation vector that has been subjected to orthogonal 
transformation and has been obtained for each of the template image and 
the input image, and a step of subjecting it to inverse orthogonal 
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transformation and generating a similarity value map. The formula of the 
similarity value, the orthogonal transformation, and the inverse orthogonal 
transformation each have linearity. It is possible to make pattern 
recognition in which the component of the similarity value is not subjected 
to positive/negative reversal through variations in brightness of its 
background. 
[SELECTED FIGURE] Fig. 1 
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Fig.l 

template images 1 edge extraction 

2/7 evaluation vector generation 
4,9 compression 
5 recording 

input image-^6 edge extraction 

1 1 inverse FFT 

12 map processing— ^result 
Fig.2(a) 

template image 

Fig.2(b) 

X component 

Fig.2(c) 

Y component 

Fig.3 

object to be processed 
Fig.4(a) 
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input image 

Fig.4(b) 

X component 

Fig.4(c) 

Y component 

Fig.5(a) 

maximum value 

Fig.7(aXb) 

background 

/\ 
\/ 

specified object 

template image input image 
Fig.9 

template images 1 edge extraction 
2/7 evaluation vector generation 
4,9 compression 
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5 recording 

input image— »6 edge extraction 

11 inverse FFT 

1 2 map processing— >result 

13 conjugate compression 

1 4 conjugate restoring 
Fig. 11 (a) 

1 edge extraction 

2 evaluation vector generation 

4 compression 

5 recording 

template images 1 5 enlargement/reduction 
Fig. 11(b) 

1 edge extraction 

2 evaluation vector generation 

4 compression 

5 recording 
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template image— >15 enlargement/reduction 
Fig. 15 

template images 1 edge extraction 
2,7 evaluation vector generation 
4,9,18 compression 
5,19 recording 

input image-^6 edge extraction 

1 1 inverse FFT 

12 map processing— ^result 
peak pattern P-» 1 7 FFT 
Fig. 17 

template image— >1 edge extraction 

2,7 evaluation vector generation 
4,9,22 compression 
5,23 recording 

input image— >6 edge extraction 



11 inverse FFT 

12 map processing^result 

20 mask pattern generation 

Fig. 18(a) 

template image 

Fig. 18(b) 

mask pattern q 

value 0 part 

value 1/N part 

Fig. 19 

template images 1 edge extraction 
2,7 evaluation vector generation 
4,9 compression 
5 recording 

input image-^6 edge extraction 

1 1 inverse FFT 

1 2 map processing-^result 



24 symmetric vector generation 
Fig.21 

overall range mask(Ma) 
1st range mask(Ml) 
2nd range mask(M2) 
12 map processing 

input image-^54 eye/eyebrow candidate range extraction 

56 square 

62,63,64 inverse FFT 

65 T| map forming 

66 eye/eyebrow center extraction—»center position 
Fig.23(a) 

don't care 
1st range 
2nd range 
1st range 
Fig.23(b) 
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overall mask(Ma) 
Fig.23(c) 

1st range mask(Ml) 
Fig.23(d) 

2nd range mask(M2) 
Fig.24 

input image-^25 FFT 
mask pattern -*26 FFT 

28 inverse FFT 

29 projection data extraction 

30 maximum point extraction— ^mouth range output 
Fig.25(c) 

maximum point 
mouth range 
projection value 
Fig.25(d) 
value 0 
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value 1 
Fig.25(e) 

mouth candidate position 
Fig.25(f) 
maximum point 
mouth range 
projection value 
Fig.26 

12 map processing— ^result output face position 
input image—>3 1 face image cutting-out 
face image— »32 digital watermark embedding 
non-face image-^33 image synthesizing 
Fig.27(b) 

digital watermark embedded part 
Fig.28 

12 map processing^result output face position 
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v axis: frequency h axis: brightness value 
title: brightness histogram 
correction function graph 

v axis: output brightness h axis: input brightness 
title: correction function 
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